76 kda helicobacter polypeptides and corresponding polynucleotide molecules

ABSTRACT

The invention provides 76 kDa Helicobacter polypeptides, which can be used in vaccination methods for preventing or treating Helicobacter infection, and polynucleotides that encode these polypeptides.

[0001] The invention relates to a family of 76 kDa Helicobacter antigensand corresponding polynucleotide molecules that can be used in methodsto prevent or treat Helicobacter infection in mammals, such as humans.

BACKGROUND OF THE INVENTION

[0002] Helicobacter is a genus of spiral, gram-negative bacteria thatcolonize the gastrointestinal tracts of mammals. Several speciescolonize the stomach, most notably H. pylori, H. heilmanii, H. felis,and H. mustelae. Although H. pylori is the species most commonlyassociated with human infection, H. heilmanii and H. felis have alsobeen isolated from humans, but at lower frequencies than H. pylori.

[0003] Helicobacter infects over 50% of adult populations in developedcountries and nearly 100% in developing countries and some Pacific rimcountries, making it one of the most prevalent infections worldwide.

[0004] Helicobacter is routinely recovered from gastric biopsies ofhumans with histological evidence of gastritis and peptic ulceration.Indeed, H. pylori is now recognized as an important pathogen of humans,in that the chronic gastritis it causes is a risk factor for thedevelopment of peptic ulcer diseases and gastric carcinoma. It is thushighly desirable to develop safe and effective vaccines for preventingand treating Helicobacter infection.

[0005] A number of Helicobacter antigens have been characterized orisolated. These include urease, which is composed of two structuralsubunits of approximately 30 and 67 kDa (Hu et al., Infect. Immun.58:992, 1990; Dunn et al., J. Biol. Chem. 265:9464, 1990; Evans et al.,Microbial Pathogenesis 10:15, 1991; Labigne et al., J. Bact., 173:1920,1991); the 87 kDa vacuolar cytotoxin (VacA) (Cover et al., J. Biol.Chem. 267:10570, 1992; Phadnis et al., Infect. Immun. 62:1557, 1994; WO93/18150); a 128 kDa immunodominant antigen associated with thecytotoxin (CagA, also called TagA; WO 93/18150; U.S. Pat. No.5,403,924); 13 and 58 kDa heat shock proteins HspA and HspB (Suerbaum etal., Mol. Microbiol. 14:959, 1994; WO 93/18150); a 54 kDa catalase(Hazell et al., J. Gen. Microbiol.137:57, 1991); a 15 kDa histidine-richprotein (Hpn) (Gilbert et al., Infect. Immun. 63:2682, 1995); a 20 kDamembrane-associated lipoprotein (Kostrcynska et al., J. Bact. 176:5938,1994); a 30 kDa outer membrane protein (Bölin et al., J. Clin.Microbiol. 33:381, 1995); a lactoferrin receptor (FR 2,724,936); andseveral porins, designated HopA, HopB, HopC, HopD, and HopE, which havemolecular weights of 48-67 kDa (Exner et al., Infect. Immun. 63:1567,1995; Doig et al., J. Bact. 177:5447, 1995). Some of these proteins havebeen proposed as potential vaccine antigens. In particular, urease isbelieved to be a vaccine candidate (WO 94/9823; WO 95/22987; WO 95/3824;Michetti et al., Gastroenterology 107:1002, 1994). Nevertheless, it isthought that several antigens may ultimately be necessary in a vaccine.

SUMMARY OF THE INVENTION

[0006] The invention provides polynucleotide molecules that encode afamily of 76 kDa Helicobacter polypeptides, designated GHPO 386, GHPO789, GHPO 1516, GHPO 1197, GHPO 1180, GHPO 896, GHPO 711, GHPO 190, GHPO185,

[0007] GHPO 1417, and GHPO 1414, which can be used, e.g., in methods toprevent, treat, or diagnose Helicobacter infection. The polypeptidesinclude those having the amino acid sequences shown in SEQ ID NOs:2-22(even numbers). Those skilled in the art will understand that theinvention also includes polynucleotide molecules that encode mutants andderivatives of these polypeptides, which can result from the addition,deletion, or substitution of non-essential amino acids, as is describedfurther below.

[0008] In addition to the polynucleotide molecules described above, theinvention includes the corresponding polypeptides (i.e., polypeptidesencoded by the polynucleotide molecules of the invention, or fragmentsthereof), and monospecific antibodies that specifically bind to thesepolypeptides.

[0009] The present invention has many applications and includesexpression cassettes, vectors, and cells transformed or transfected withthe polynucleotides of the invention. Accordingly, the present inventionprovides (i) methods for producing polypeptides of the invention inrecombinant host systems and related expression cassettes, vectors, andtransformed or transfected cells; (ii) live vaccine vectors, such as poxvirus, Salmonella typhimurium, and Vibrio cholerae vectors, that containpolynucleotides of the invention (such vaccine vectors being useful in,e.g., methods for preventing or treating Helicobacter infection) incombination with a diluent or carrier, and related pharmaceuticalcompositions and associated therapeutic and/or prophylactic methods;(iii) therapeutic and/or prophylactic methods involving administrationof polynucleotide molecules, either in a naked form or formulated with adelivery vehicle, polypeptides or mixtures of polypeptides, ormonospecific antibodies of the invention, and related pharmaceuticalcompositions; (iv) methods for detecting the presence of Helicobacter inbiological samples, which can involve the use of polynucleotidemolecules, monospecific antibodies, or polypeptides of the invention;and (v) methods for purifying polypeptides of the invention byantibody-based affinity chromatography.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010]FIG. 1 is an alignment of the predicted amino acid sequences ofGHPO 386 (SEQ ID NO:2), GHPO 789 (SEQ ID NO:4), and GHPO 1516 (SEQ IDNO:6), as well as a consensus sequence for the 76 kDa protein family.

[0011]FIG. 2 is an alignment of the predicted amino acid sequences ofGHPO 1197 (SEQ ID NO:8), GHPO 1180 (SEQ ID NO:10), GHPO 896 (SEQ IDNO:12), GHPO 711 (SEQ ID NO:14), GHPO 190 (SEQ ID NO:16), GHPO 185 (SEQID NO:18), GHPO 1417 (SEQ ID NO:20), and GHPO 1414 (SEQ ID NO:22), aswell as a consensus sequence for the 76 kDa protein family.

DETAILED DESCRIPTION

[0012] Open reading frames (ORFs) encoding a family of new, full length,membrane-associated 76 kDa polypeptides, designated GHPO 386, GHPO 789,GHPO 1516, GHPO 1197, GHPO 1180, GHPO 896, GHPO 711, GHPO 190, GHPO 185,GHPO 1417, and GHPO 1414, have been identified in the H. pylori genome.The amino acid sequences of these proteins are aligned in FIGS. 1 and 2.These polypeptides can be used, for example, in vaccination methods forpreventing or treating Helicobacter infection. The polypeptides of theinvention are secreted polypeptides that can be produced in their matureforms (i.e., as polypeptides that have been exported through class II orclass III secretion pathways) or as precursors that include a signalpeptide, which can be removed in the course of excretion/secretion bycleavage at the N-terminal end of the mature form. (The cleavage site islocated at the C-terminal end of the signal peptide, adjacent to themature form.) The cleavage site for the polypeptides of the inventionand, thus, the first amino acid of the mature polypeptides, wasputatively determined.

[0013] According to a first aspect of the invention, there are providedisolated polynucleotides that encode the precursor and mature forms ofHelicobacter GHPO 386, GHPO 789, GHPO 1516, GHPO 1197, GHPO 1180, GHPO896, GHPO 711, GHPO 190, GHPO 185, GHPO 1417, and GHPO 1414.

[0014] An isolated polynucleotide of the invention encodes:

[0015] (i) a polypeptide having an amino acid sequence that ishomologous to a Helicobacter amino acid sequence of a polypeptideassociated with the Helicobacter membrane, the Helicobacter amino acidsequence being selected from the group consisting of the amino acidsequences shown:

[0016] in SEQ ID NO:2, beginning with an amino acid in any one ofpositions −19 to 5, preferably in position −19 or position 1, and endingwith an amino acid in position 689 (GHPO 386);

[0017] in SEQ ID NO:4, beginning with an amino acid in any one ofpositions −20 to 5, preferably in position −20 or position 1, and endingwith an amino acid in position 713 (GHPO 789);

[0018] in SEQ ID NO:6, beginning with an amino acid in any one ofpositions −20 to 5, preferably in position −20 or position 1, and endingwith an amino acid in position 725 (GHPO 1516);

[0019] in SEQ ID NO:8, beginning with an amino acid in any one ofpositions −20 to 5, preferably in position −20 or position 1, and endingwith an amino acid in position 691 (GHPO 1197);

[0020] in SEQ ID NO:10, beginning with an amino acid in any one ofpositions −20 to 5, preferably in position −20 or position 1, and endingwith an amino acid in position 652 (GHPO 1180);

[0021] in SEQ ID NO:12, beginning with an amino acid in any one ofpositions −18 to 5, preferably in position −18 or position 1, and endingwith an amino acid in position 673 (GHPO 896);

[0022] in SEQ ID NO:14, beginning with an amino acid in any one ofpositions −21 to 5, preferably in position −21 or position 1, and endingwith an amino acid in position 619 (GHPO 711);

[0023] in SEQ ID NO:16, beginning with an amino acid in any one ofpositions −17 to 5, preferably in position −17 or position 1, and endingwith an amino acid in position 635 (GHPO 190);

[0024] in SEQ ID NO:18, beginning with an amino acid in any one ofpositions −19 to 5, preferably in position −19 or position 1, and endingwith an amino acid in position 626 (GHPO 185);

[0025] in SEQ ID NO:20, beginning with an amino acid in any one ofpositions −16 to 5, preferably in position −16 or position 1, and endingwith an amino acid in position 467 (GHPO 1417); and

[0026] in SEQ ID NO:22, beginning with an amino acid in any one ofpositions −18 to 5, preferably in position −18 or position 1, and endingwith an amino acid in position 673 (GHPO 1414); or

[0027] (ii) a derivative of the polypeptide.

[0028] The term “isolated polynucleotide” is defined as a polynucleotidethat is removed from the environment in which it naturally occurs. Forexample, a naturally-occurring DNA molecule present in the genome of aliving bacteria or as part of a gene bank is not isolated, but the samemolecule, separated from the remaining part of the bacterial genome, asa result of, e.g., a cloning event (amplification), is “isolated.”Typically, an isolated DNA molecule is free from DNA regions (e.g.,coding regions) with which it is immediately contiguous, at the 5′ or 3′ends, in the naturally occurring genome. Such isolated polynucleotidescan be part of a vector or a composition and still be isolated, as sucha vector or composition is not part of its natural environment.

[0029] A polynucleotide of the invention can consist of RNA or DNA(e.g., cDNA, genomic DNA, or synthetic DNA), or modifications orcombinations of RNA or DNA. The polynucleotide can be double-stranded orsingle-stranded and, if single-stranded, can be the coding (sense)strand or the non-coding (anti-sense) strand. The sequences that encodepolypeptides of the invention, as shown in SEQ ID NOs:2-22 (evennumbers), can be (a) the coding sequence as shown in SEQ ID NOs:1-21(odd numbers); (b) a ribonucleotide sequence derived by transcription of(a); or (c) a different coding sequence that, as a result of theredundancy or degeneracy of the genetic code, encodes the samepolypeptides as the polynucleotide molecules having the sequencesillustrated in SEQ ID NOs:1-21 (odd numbers). The polypeptides of theinvention can be ones that are naturally secreted or excreted by, e.g.,H. felis, H. mustelae, H. heilmanii, or H. pylori.

[0030] By “polypeptide” or “protein” is meant any chain of amino acids,regardless of length or post-translational modification (e.g.,glycosylation or phosphorylation). Both terms are used interchangeablyin the present application.

[0031] By “homologous amino acid sequence” is meant an amino acidsequence that differs from an amino acid sequence shown in any of SEQ IDNOs:2-22 (even numbers), or an amino acid sequence encoded by thenucleotide sequence of any of SEQ ID NOs:1-21 (odd numbers), by one ormore non-conservative amino acid substitutions, deletions, or additionslocated at positions at which they do not destroy the specificantigenicity of the polypeptide. Preferably, such a sequence is at least75%, more preferably at least 80%, and most preferably at least 90%identical to an amino acid sequence shown in any of SEQ ID NOs:2-22(even numbers).

[0032] Homologous amino acid sequences include sequences that areidentical or substantially identical to an amino acid sequence as shownin any of SEQ ID NOs:2-22 (even numbers). By “amino acid sequence thatis substantially identical” is meant a sequence that is at least 90%,preferably at least 95%, more preferably at least 97%, and mostpreferably at least 99% identical to an amino acid sequence of referenceand that differs from the sequence of reference, if at all, by amajority of conservative amino acid substitutions.

[0033] Conservative amino acid substitutions typically includesubstitutions among amino acids of the same class. These classesinclude, for example, amino acids having uncharged polar side chains,such as asparagine, glutamine, serine, threonine, and tyrosine; aminoacids having basic side chains, such as lysine, arginine, and histidine;amino acids having acidic side chains, such as aspartic acid andglutamic acid; and amino acids having nonpolar side chains, such asglycine, alanine, valine, leucine, isoleucine, proline, phenylalanine,methionine, tryptophan, and cysteine.

[0034] Homology can be measured using sequence analysis software (e.g.,Sequence Analysis Software Package of the Genetics Computer Group,University of Wisconsin Biotechnology Center, 1710 University Avenue,Madison, Wis. 53705). Similar amino acid sequences are aligned to obtainthe maximum degree of homology (i.e., identity). To this end, it may benecessary to artificially introduce gaps into the sequence. Once theoptimal alignment has been set up, the degree of homology (i.e.,identity) is established by recording all of the positions in which theamino acids of both sequences are identical, relative to the totalnumber of positions.

[0035] Homologous polynucleotide sequences are defined in a similar way.Preferably, a homologous sequence is one that is at least 45%, morepreferably at least 60%, and most preferably at least 85% identical to acoding sequence of any of SEQ ID NOs:1-21 (odd numbers).

[0036] Polypeptides having a sequence homologous to one of the sequencesshown in SEQ ID NOs:2-22 (even numbers) include naturally-occurringallelic variants, as well as mutants or any other non-naturallyoccurring variants that are analogous in terms of antigenicity, to apolypeptide having a sequence as shown in SEQ ID NOs:2-22 (evennumbers).

[0037] As is known in the art, an allelic variant is an alternate formof a polypeptide that is characterized as having a substitution,deletion, or addition of one or more amino acids that does not alter thebiological function of the polypeptide. By “biological function” ismeant a function of the polypeptide in the cells in which it naturallyoccurs, even if the function is not necessary for the growth or survivalof the cells. For example, the biological function of a porin is toallow the entry into cells of compounds present in the extracellularmedium. The biological function is distinct from the antigenic function.A polypeptide can have more than one biological function.

[0038] Allelic variants are very common in nature. For example, abacterial species, e.g., H. pylori, is usually represented by a varietyof strains that differ from each other by minor allelic variations.Indeed, a polypeptide that fulfills the same biological function indifferent strains can have an amino acid sequence that is not identicalin each of the strains. Such an allelic variation can be equallyreflected at the polynucleotide level.

[0039] Support for the use of allelic variants of polypeptide antigenscomes from, e.g., studies of the Helicobacter urease antigen. The aminoacid sequence of Helicobacter urease varies widely from species tospecies, yet cross-species protection occurs, indicating that the ureasemolecule, when used as an immunogen, is highly tolerant of amino acidvariations. Even among different strains of the single species H.pylori, there are amino acid sequence variations.

[0040] For example, although the amino acid sequences of the UreA andUreB subunits of H. pylori and H. felis ureases differ from one anotherby 26.5% and 11.8%, respectively (Ferrero et al., Molecular Microbiology9(2):323-333, 1993), it has been shown that H. pylori urease protectsmice from H. felis infection (Michetti et al., Gastroenterology107:1002, 1994). In addition, it has been shown that the individualstructural subunits of urease, UreA and UreB, which contain distinctamino acid sequences, are both protective antigens against Helicobacterinfection (Michetti et al., supra). Similarly, Cuenca et al.(Gastroenterology 110:1770, 1996) showed that therapeutic immunizationof H. mustelae-infected ferrets with H. pylori urease was effective ateradicating H. mustelae infection. Further, several urease variants havebeen reported to be effective vaccine antigens, including, e.g.,recombinant UreA+UreB apoenzyme expressed from pORV142 (UreA and UreBsequences derived from H. pylori strain CPM630; Lee et al., J. Infect.Dis.172:161, 1995); recombinant UreA+UreB apoenzyme expressed frompORV214 (UreA and UreB sequences differ from H. pylori strain CPM630 byone and two amino acid changes, respectively; Lee et al., supra, 1995);a UreA-glutathione-S-transferase fusion protein (UreA sequence from H.pylori strain ATCC 43504; Thomas et al., Acta Gastro-EnterologicaBelgica 56:54, 1993); UreA+UreB holoenzyme purified from H. pyloristrain NCTCl 1637 (Marchetti et al., Science 267:1655, 1995); a UreA-MBPfusion protein (UreA from H. pylori strain 85P; Ferrero et al.,Infection and Immunity 62:4981, 1994); a UreB-MBP fusion protein (UreBfrom H. pylori strain 85P; Ferrero et al., supra); a UreA-MBP fusionprotein (UreA from H. felis strain ATCC 49179; Ferrero et al., supra); aUreB-MBP fusion protein (UreB from H. felis strain ATCC 49179; Ferreroet al., supra); and a 37 kDa fragment of UreB containing amino acids220-569 (Dore-Davin et al., “A 37 kD fragment of UreB is sufficient toconfer protection against Helicobacter felis infection in mice”).Finally, Thomas et al. (supra) showed that oral immunization of micewith crude sonicates of H. pylori protected mice from subsequentchallenge with H. felis.

[0041] Polynucleotides, e.g., DNA molecules, encoding allelic variantscan easily be obtained by polymerase chain reaction (PCR) amplificationof genomic bacterial DNA extracted by conventional methods. Thisinvolves the use of synthetic oligonucleotide primers matching sequencesthat are upstream and downstream of the 5′ and 3′ ends of the codingregion. Suitable primers can be designed based on the nucleotidesequence information provided in SEQ ID NOs:1-21 (odd numbers).Typically, a primer consists of 10 to 40, preferably 15 to 25nucleotides. It can also be advantageous to select primers containing Cand G nucleotides in proportions sufficient to ensure efficienthybridization, e.g., an amount of C and G nucleotides of at least 40%,preferably 50%, of the total nucleotide amount. Those skilled in the artcan readily design primers that can be used to isolate thepolynucleotides of the invention from different Helicobacter strains.

[0042] As an example, primers useful for cloning a DNA molecule encodinga polypeptide having the amino acid sequence of unprocessed GHPO 386(SEQ ID NO:2), including a signal peptide, are shown in SEQ ID NO:23(matching at the 5′ end) and in SEQ ID NO:25 (matching at the 3′ end).Primers useful for cloning a DNA molecule encoding a polypeptide havingthe amino acid sequence of mature GHPO 386 (amino acids 1-689 of SEQ IDNO:2), lacking a signal peptide, are shown in SEQ ID NO:24 (matching atthe 5′ end) and in SEQ ID NO:25 (matching at the 3′ end). Experimentalconditions for carrying out PCR can readily be determined by one skilledin the art and illustrations of carrying out PCR are provided inExamples 3 and 4.

[0043] Thus, the first aspect of the invention includes:

[0044] (i) isolated DNA molecules that can be amplified and/or cloned bythe polymerase chain reaction from a Helicobacter, e.g., H. pylori,genome using either:

[0045] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:23, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:25 (unprocessed GHPO 386);

[0046] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:26, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:28 (unprocessed GHPO 789);

[0047] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:29, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:31 (unprocessed GHPO 1516);

[0048] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:32, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:34 (unprocessed GHPO 1197);

[0049] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:35, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:37 (unprocessed GHPO 1180);

[0050] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:38, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:40 (unprocessed GHPO 896);

[0051] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:41, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:43 (unprocessed GHPO 711);

[0052] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:44, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:46 (unprocessed GHPO 190);

[0053] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:47, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:49 (unprocessed GHPO 185);

[0054] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:50, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:52 (unprocessed GHPO 1417); or

[0055] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:53, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:55 (unprocessed GHPO 1414); and

[0056] (ii) isolated DNA molecules that can be amplified and/or clonedby the polymerase chain reaction from a Helicobacter, e.g., H. pylori,genome using either:

[0057] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:24, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:25 (mature GHPO 386);

[0058] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:27, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:28 (mature GHPO 789);

[0059] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:30, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:31 (mature GHPO 1516);

[0060] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:33, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:34 (mature GHPO 1197);

[0061] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:36, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:37 (mature GHPO 1180);

[0062] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:39, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:40 (mature GHPO 896);

[0063] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:42, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:43 (mature GHPO 711);

[0064] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:45, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:46 (mature GHPO 190);

[0065] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:48, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:49 (mature GHPO 185);

[0066] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:51, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:52 (mature GHPO 1417); or

[0067] a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:54, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:55 (mature GHPO 1414).

[0068] In the sequences of SEQ ID NOs:23-55, the letter “N” denotes arestriction endonuclease digestion site that contains, typically, 4 to 6nucleotides. For example, the sequences 5′-GGATCC-3′ (BamHI) or5′-CTCGAG-3′ (XhoI) can be used. Restriction sites can be selected bythose skilled in the art so that the amplified DNA can be convenientlycloned into an appropriately digested vector, such as a plasmid.

[0069] Useful homologs that do not occur naturally can be designed usingknown methods for identifying regions of an antigen that are likely tobe tolerant of amino acid sequence changes and/or deletions. Forexample, sequences of the antigen from different species can be comparedto identify conserved sequences.

[0070] Polypeptide derivatives that are encoded by polynucleotides ofthe invention include, e.g., fragments, polypeptides having largeinternal deletions derived from full-length polypeptides, and fusionproteins. Polypeptide fragments of the invention can be derived from apolypeptide having a sequence homologous to the sequences of any of SEQID NOs:2-22 (even numbers), to the extent that the fragments retain thesubstantial antigenicity of the parent polypeptide (specificantigenicity). Polypeptide derivatives can also be constructed by largeinternal deletions that remove a substantial part of the parentpolypeptide, while retaining specific antigenicity. Generally,polypeptide derivatives should be about at least 12 amino acids inlength to maintain antigenicity. Advantageously, they can be at least 20amino acids, preferably at least 50 amino acids, more preferably atleast 75 amino acids, and most preferably at least 100 amino acids inlength.

[0071] Useful polypeptide derivatives, e.g., polypeptide fragments, canbe designed using computer-assisted analysis of amino acid sequences inorder to identify sites in protein antigens having potential assurface-exposed, antigenic regions (Hughes et al., Infect. Immun.60(9):3497, 1992). For example, the Laser Gene Program from DNA Star canbe used to obtain hydrophilicity, antigenic index, and intensity indexplots for the polypeptides of the invention. This program can also beused to obtain information about homologies of the polypeptides withknown protein motifs. One skilled in the art can readily use theinformation provided in such plots to select peptide fragments for useas vaccine antigens. For example, fragments spanning regions of theplots in which the antigenic index is relatively high can be selected.One can also select fragments spanning regions in which both theantigenic index and the intensity plots are relatively high. Fragmentscontaining conserved sequences, particularly hydrophilic conservedsequences, can also be selected.

[0072] Polypeptide fragments and polypeptides having large internaldeletions can be used for revealing epitopes that are otherwise maskedin the parent polypeptide and that may be of importance for inducing aprotective T cell-dependent immune response. Deletions can also removeimmunodominant regions of high variability among strains.

[0073] It is an accepted practice in the field of immunology to usefragments and variants of protein immunogens as vaccines, as all that isrequired to induce an immune response to a protein is a small (e.g., 8to 10 amino acids) immunogenic region of the protein. This has been donefor a number of vaccines against pathogens other than Helicobacter. Forexample, short synthetic peptides corresponding to surface-exposedantigens of pathogens such as murine mammary tumor virus (peptidecontaining 11 amino acids; Dion et al., Virology 179:474-477, 1990),Semliki Forest virus (peptide containing 16 amino acids; Snijders etal., J. Gen. Virol. 72:557-565, 1991), and canine parvovirus (2overlapping peptides, each containing 15 amino acids; Langeveld et al.,Vaccine 12(15): 1473-1480, 1994) have been shown to be effective vaccineantigens against their respective pathogens.

[0074] Polynucleotides encoding polypeptide fragments and polypeptideshaving large internal deletions can be constructed using standardmethods (see, e.g., Ausubel et al., Current Protocols in MolecularBiology, John Wiley & Sons Inc., 1994), for example, by PCR, includinginverse PCR, by restriction enzyme treatment of the cloned DNAmolecules, or by the method of Kunkel et al. (Proc. Natl. Acad. Sci. USA82:448, 1985; biological material available at Stratagene).

[0075] A polypeptide derivative can also be produced as a fusionpolypeptide that contains a polypeptide or a polypeptide derivative ofthe invention fused, e.g., at the N- or C-terminal end, to any otherpolypeptide (hereinafter referred to as a peptide tail). Such a productcan be easily obtained by translation of a genetic fusion, i.e., ahybrid gene. Vectors for expressing fusion polypeptides are commerciallyavailable, and include the pMal-c2 or pMal-p2 systems of New EnglandBiolabs, in which the peptide tail is a maltose binding protein, theglutathione-S-transferase system of Pharmacia, or the His-Tag systemavailable from Novagen. These and other expression systems provideconvenient means for further purification of polypeptides andderivatives of the invention.

[0076] Another particular example of fusion polypeptides included ininvention includes a polypeptide or polypeptide derivative of theinvention fused to a polypeptide having adjuvant activity, such as,e.g., subunit B of either cholera toxin or E. coli heat-labile toxin.Several possibilities can be used for producing such fusion proteins.First, the polypeptide of the invention can be fused to the N-terminalend or, preferably, to the C-terminal end of the polypeptide havingadjuvant activity. Second, a polypeptide fragment of the invention canbe fused within the amino acid sequence of the polypeptide havingadjuvant activity. Spacer sequences can also be included, if desired.

[0077] As stated above, the polynucleotides of the invention encodeHelicobacter polypeptides in precursor or mature form. They can alsoencode hybrid precursors containing heterologous signal peptides, whichcan mature into polypeptides of the invention. By “heterologous signalpeptide” is meant a signal peptide that is not found in thenaturally-occurring precursor of a polypeptide of the invention.

[0078] A polynucleotide of the invention hybridizes, preferably understringent conditions, to a polynucleotide having a sequence as shown inany of SEQ ID NOs:1-21 (odd numbers). Hybridization procedures are,e.g., described by Ausubel et al. (supra); Silhavy et al. (Experimentswith Gene Fusions, Cold Spring Harbor Laboratory Press, Cold SpringHarbor, N.Y., 1984); and Davis et al. (A Manual for Genetic Engineering:Advanced Bacterial Genetics, Cold Spring Harbor Laboratory Press, ColdSpring Harbor, N.Y., 1980). Important parameters that can be consideredfor optimizing hybridization conditions are reflected in the followingformula, which facilitates calculation of the melting temperature (Tm),which is the temperature above which two complementary DNA strandsseparate from one another (Casey et al., Nucl. Acid Res. 4:1539, 1977):Tm=81.5+0.5×(% G+C)+1.6 log(positive ion concentration)−0.6×(%formamide). Under appropriate stringency conditions, hybridizationtemperature (Th) is approximately 20 to 40° C., 20 to 25° C., or,preferably, 30 to 40° C. below the calculated Tm. Those skilled in theart will understand that optimal temperature and salt conditions can bereadily determined empirically in preliminary experiments usingconventional procedures. For example, stringent conditions can beachieved, both for pre-hybridizing and hybridizing incubations, (i)within 4-16 hours at 42° C., in 6× SSC containing 50% formamide or (ii)within 4-16 hours at 65° C. in an aqueous 6× SSC solution (1 M NaCl, 0.1M sodium citrate (pH 7.0)). For polynucleotides containing 30 to 600nucleotides, the above formula is used and then is corrected bysubtracting (600/polynucleotide size in base pairs). Stringencyconditions are defined by a Th that is 5 to 10° C. below Tm.

[0079] Hybridization conditions with oligonucleotides shorter than 20-30bases do not precisely follow the rules set forth above. In such cases,the formula for calculating the Tm is as follows: Tm=4×(G+C)+2(A+T). Forexample, an 18 nucleotide fragment of 50% G+C would have an approximateTm of 54° C.

[0080] A polynucleotide molecule of the invention, containing RNA, DNA,or modifications or combinations thereof, can have various applications.For example, a polynucleotide molecule can be used (i) in a process forproducing the encoded polypeptide in a recombinant host system, (ii) inthe construction of vaccine vectors such as poxviruses, which arefurther used in methods and compositions for preventing and/or treatingHelicobacter infection, (iii) as a vaccine agent, in a naked form orformulated with a delivery vehicle and, (iv) in the construction ofattenuated Helicobacter strains that can over-express a polynucleotideof the invention or express it in a non-toxic, mutated form.

[0081] According to a second aspect of the invention, there is thereforeprovided (i) an expression cassette containing a polynucleotide moleculeof the invention placed under the control of elements (e.g., a promoter)required for expression; (ii) an expression vector containing anexpression cassette of the invention; (iii) a procaryotic or eucaryoticcell transformed or transfected with an expression cassette and/orvector of the invention, as well as (iv) a process for producing apolypeptide or polypeptide derivative encoded by a polynucleotide of theinvention, which involves culturing a procaryotic or eucaryotic celltransformed or transfected with an expression cassette and/or vector ofthe invention, under conditions that allow expression of thepolynucleotide molecule of the invention and, recovering the encodedpolypeptide or polypeptide derivative from the cell culture.

[0082] A recombinant expression system can be selected from procaryoticand eucaryotic hosts. Eucaryotic hosts include, for example, yeast cells(e.g., Saccharomyces cerevisiae or Pichia Pastoris), mammalian cells(e.g., COS 1, NIH3T3, or JEG3 cells), arthropods cells (e.g., Spodopterafrugiperda (SF9) cells), and plant cells. Preferably, a procaryotic hostsuch as E. coli is used. Bacterial and eucaryotic cells are availablefrom a number of different sources that are known to those skilled inthe art, e.g., the American Type Culture Collection (ATCC; Rockville,Md.).

[0083] The choice of the expression cassette will depend on the hostsystem selected, as well as the features desired for the expressedpolypeptide. For example, it may be useful to produce a polypeptide ofthe invention in a particular lipidated form or any other form.Typically, an expression cassette includes a constitutive or induciblepromoter that is functional in the selected host system; a ribosomebinding site; a start codon (ATG); if necessary, a region encoding asignal peptide, e.g., a lipidation signal peptide; a polynucleotidemolecule of the invention; a stop codon; and, optionally, a 3′ terminalregion (translation and/or transcription terminator). The signalpeptide-encoding region is adjacent to the polynucleotide of theinvention and is placed in the proper reading frame. The signalpeptide-encoding region can be homologous or heterologous to thepolynucleotide molecule encoding the mature polypeptide and it can bespecific to the secretion apparatus of the host used for expression. Theopen reading frame constituted by the polynucleotide molecule of theinvention, alone or together with the signal peptide, is placed underthe control of the promoter so that transcription and translation occurin the host system. Promoters and signal peptide-encoding regions arewidely known and available to those skilled in the art and include, forexample, the promoter of Salmonella typhimurium (and derivatives) thatis inducible by arabinose (promoter araB) and is functional inGram-negative bacteria such as E. coli (U.S. Pat. No. 5,028,530; Cagnonet al., Protein Engineering 4(7):843, 1991); the promoter of thebacteriophage T7 RNA polymerase gene, which is functional in a number ofE. coli strains expressing T7 polymerase (U.S. Pat. No. 4,952,496); theOspA lipidation signal peptide; and RlpB lipidation signal peptide(Takase et al., J. Bact. 169:5692, 1987).

[0084] The expression cassette is typically part of an expressionvector, which is selected for its ability to replicate in the chosenexpression system. Expression vectors (e.g., plasmids or viral vectors)can be chosen from, for example, those described in Pouwels et al.(Cloning Vectors: A Laboratory Manual, 1985, Supp. 1987) and canpurchased from various commercial sources. Methods for transforming ortransfecting host cells with expression vectors are well known in theart and will depend on the host system selected, as described in Ausubelet al. (supra).

[0085] Upon expression, a recombinant polypeptide of the invention (or apolypeptide derivative) is produced and remains in the intracellularcompartment, is secreted/excreted in the extracellular medium or in theperiplasmic space, or is embedded in the cellular membrane. Thepolypeptide can then be recovered in a substantially purified form fromthe cell extract or from the supernatant after centrifugation of thecell culture. Typically, the recombinant polypeptide can be purified byantibody-based affinity purification or by any other method known to aperson skilled in the art, such as by genetic fusion to a smallaffinity-binding domain. Antibody-based affinity purification methodsare also available for purifying a polypeptide of the inventionextracted from a Helicobacter strain. Antibodies useful forimmunoaffinity purification of the polypeptides of the invention can beobtained using methods described below.

[0086] Polynucleotides of the invention can also be used in DNAvaccination methods, using either a viral or bacterial host as genedelivery vehicle (live vaccine vector) or administering the gene in afree form, e.g., inserted into a plasmid. Therapeutic or prophylacticefficacy of a polynucleotide of the invention can be evaluated as isdescribed below.

[0087] Accordingly, in a third aspect of the invention, there isprovided (i) a vaccine vector such as a poxvirus, containing apolynucleotide molecule of the invention placed under the control ofelements required for expression; (ii) a composition of mattercontaining a vaccine vector of the invention, together with a diluent orcarrier; (iii) a pharmaceutical composition containing a therapeuticallyor prophylactically effective amount of a vaccine vector of theinvention; (iv) a method for inducing an immune response againstHelicobacter in a mammal (e.g., a human; alternatively, the method canbe used in veterinary applications for treating or preventingHelicobacter infection of animals, e.g., cats or birds), which involvesadministering to the mammal an immunogenically effective amount of avaccine vector of the invention to elicit an immune response, e.g., aprotective or therapeutic immune response to Helicobacter; and (v) amethod for preventing and/or treating a Helicobacter (e.g., H. pylori,H. felis, H. mustelae, or H. heilmanii) infection, which involvesadministering a prophylactic or therapeutic amount of a vaccine vectorof the invention to an individual in need. Additionally, the thirdaspect of the invention encompasses the use of a vaccine vector of theinvention in the preparation of a medicament for preventing and/ortreating Helicobacter infection.

[0088] A vaccine vector of the invention can express one or severalpolypeptides or derivatives of the invention, as well as at least oneadditional Helicobacter antigen such as a urease apoenzyme or a subunit,fragment, homolog, mutant, or derivative thereof. In addition, it canexpress a cytokine, such as interleukin-2 (IL-2) or interleukin-12(IL-12), that enhances the immune response. Thus, a vaccine vector caninclude an additional polynucleotide molecules encoding, e.g., ureasesubunit A, B, or both, or a cytokine, placed under the control ofelements required for expression in a mammalian cell.

[0089] Alternatively, a composition of the invention can include severalvaccine vectors, each of which being capable of expressing a polypeptideor derivative of the invention. A composition can also contain a vaccinevector capable of expressing an additional Helicobacter antigen such asurease apoenzyme, a subunit, fragment, homolog, mutant, or derivativethereof, or a cytokine such as IL-2 or IL-12.

[0090] In vaccination methods for treating or preventing infection in amammal, a vaccine vector of the invention can be administered by anyconventional route in use in the vaccine field, for example, to amucosal (e.g., ocular, intranasal, oral, gastric, pulmonary, intestinal,rectal, vaginal, or urinary tract) surface or via a parenteral (e.g.,subcutaneous, intradermal, intramuscular, intravenous, orintraperitoneal) route. Preferred routes depend upon the choice of thevaccine vector. The administration can be achieved in a single dose orrepeated at intervals. The appropriate dosage depends on variousparameters that are understood by those skilled in the art, such as thenature of the vaccine vector itself, the route of administration, andthe condition of the mammal to be vaccinated (e.g., the weight, age, andgeneral health of the mammal).

[0091] Live vaccine vectors that can be used in the invention includeviral vectors, such as adenoviruses and poxviruses, as well as bacterialvectors, e.g., Shigella, Salmonella, Vibrio cholerae, Lactobacillus,Bacille bilié de Calmette-Guérin (BCG), and Streptococcus. An example ofan adenovirus vector, as well as a method for constructing an adenovirusvector capable of expressing a polynucleotide molecule of the invention,is described in U.S. Pat. No. 4,920,209. Poxvirus vectors that can beused in the invention include, e.g., vaccinia and canary pox viruses,which are described in U.S. Pat. No. 4,722,848 and U.S. Pat. No.5,364,773, respectively (also see, e.g., Tartaglia et al., Virology188:217, 1992, for a description of a vaccinia virus vector, and Tayloret al, Vaccine 13:539, 1995, for a description of a canary poxvirusvector). Poxvirus vectors capable of expressing a polynucleotide of theinvention can be obtained by homologous recombination, as described inKieny et al. (Nature 312:163, 1984) so that the polynucleotide of theinvention is inserted in the viral genome under appropriate conditionsfor expression in mammalian cells. Generally, the dose of viral vectorvaccine, for therapeutic or prophylactic use, can be from about 1×10⁴,to about 1×10¹¹, advantageously from about 1×10⁷ to about 1×10¹¹, or,preferably, from about 1×10⁷ to about 1×10⁹ plaque-forming units perkilogram. Preferably, viral vectors are administered parenterally, forexample, in 3 doses that are 4 weeks apart. Those skilled in the artwill recognize that it is preferable to avoid adding a chemical adjuvantto a composition containing a viral vector of the invention and therebyminimizing the immune response to the viral vector itself.

[0092] Non-toxicogenic Vibrio cholerae mutant strains that can be usedin live oral vaccines are described by Mekalanos et al. (Nature 306:551,1983) and in U.S. Pat. No. 4,882,278 (strain in which a substantialamount of the coding sequence of each of the two ctxA alleles has beendeleted so that no functional cholerae toxin is produced); WO 92/11354(strain in which the irgA locus is inactivated by mutation; thismutation can be combined in a single strain with ctxA mutations); and WO94/1533 (deletion mutant lacking functional ctxA and attRS1 DNAsequences). These strains can be genetically engineered to expressheterologous antigens, as described in WO 94/19482. An effective vaccinedose of a V. cholerae strain capable of expressing a polypeptide orpolypeptide derivative encoded by a polynucleotide molecule of theinvention can contain, e.g., about 1×10⁵ to about 1×10⁹, preferablyabout 1×10⁶ to about 1×10⁸ viable bacteria in an appropriate volume forthe selected route of administration. Preferred routes of administrationinclude all mucosal routes, but, most preferably, these vectors areadministered intranasally or orally.

[0093] Attenuated Salmonella typhimurium strains, genetically engineeredfor recombinant expression of heterologous antigens, and their use asoral vaccines, are described by Nakayama et al. (Bio/Technology 6:693,1988) and in WO 92/11361. Preferred routes of administration for thesevectors include all mucosal routes. Most preferably, the vectors areadministered intranasally or orally.

[0094] Others bacterial strains useful as vaccine vectors are describedby High et al. (EMBO 11:1991, 1992) and Sizemore et al. (Science270:299, 1995; Shigella flexneri); Medaglini et al. (Proc. Natl. Acad.Sci. USA 92:6868, 1995; (Streptococcus gordonii); Flynn (Cell. Mol.Biol. 40 (suppl. I):31, 1194), and in WO 88/6626, WO 90/0594, WO91/13157, WO 92/1796, and WO 92/21376 (Bacille Calmette Guerin). Inbacterial vectors, a polynucleotide of the invention can be insertedinto the bacterial genome or it can remain in a free state, for example,carried on a plasmid.

[0095] An adjuvant can also be added to a composition containing abacterial vector vaccine. A number of adjuvants that can be used areknown to those skilled in the art. For example, preferred adjuvants canbe selected from the list provided below.

[0096] According to a fourth aspect of the invention, there is alsoprovided (i) a composition of matter containing a polynucleotide of theinvention, together with a diluent or carrier; (ii) a pharmaceuticalcomposition containing a therapeutically or prophylactically effectiveamount of a polynucleotide of the invention; (iii) a method for inducingan immune response against Helicobacter, in a mammal, by administeringto the mammal an immunogenically effective amount of a polynucleotide ofthe invention to elicit an immune response, e.g., a protective immuneresponse to Helicobacter; and (iv) a method for preventing and/ortreating a Helicobacter (e.g., H. pylori, H. felis, H. mustelae, or H.heilmanii) infection, by administering a prophylactic or therapeuticamount of a polynucleotide of the invention to an individual in need ofsuch treatment. Additionally, the fourth aspect of the inventionencompasses the use of a polynucleotide of the invention in thepreparation of a medicament for preventing and/or treating Helicobacterinfection. The fourth aspect of the invention preferably includes theuse of a polynucleotide molecule placed under conditions for expressionin a mammalian cell, e.g., in a plasmid that is unable to replicate inmammalian cells and to substantially integrate into a mammalian genome.

[0097] Polynucleotides (for example, DNA or RNA molecules) of theinvention can also be administered as such to a mammal as a vaccine.When a DNA molecule of the invention is used, it can be in the form of aplasmid that is unable to replicate in a mammalian cell and unable tointegrate into the mammalian genome. Typically, a DNA molecule is placedunder the control of a promoter suitable for expression in a mammaliancell. The promoter can function ubiquitously or tissue-specifically.Examples of non-tissue specific promoters include the earlyCytomegalovirus (CMV) promoter (U.S. Pat. No. 4,168,062) and the RousSarcoma Virus promoter (Norton et al., Molec. Cell Biol. 5:281, 1985).The desmin promoter (Li et al., Gene 78:243, 1989; Li et al., J. Biol.Chem. 266:6562, 1991; Li et al., J. Biol. Chem. 268:10403, 1993) istissue-specific and drives expression in muscle cells. More generally,useful promoters and vectors are described, e.g., in WO 94/21797 and byHartikka et al. (Human Gene Therapy 7:1205, 1996).

[0098] For DNA/RNA vaccination, the polynucleotide of the invention canencode a precursor or a mature form of a polypeptide of the invention.When it encodes a precursor form, the precursor sequence can behomologous or heterologous. In the latter case, a eucaryotic leadersequence can be used, such as the leader sequence of the tissue-typeplasminogen factor (tPA).

[0099] A composition of the invention can contain one or severalpolynucleotides of the invention. It can also contain at least oneadditional polynucleotide encoding another Helicobacter antigen, such asurease subunit A, B, or both, or a fragment, derivative, mutant, oranalog thereof. A polynucleotide encoding a cytokine, such asinterleukin-2 (IL-2) or interleukin-12 (IL-12), can also be added to thecomposition so that the immune response is enhanced. These additionalpolynucleotides are placed under appropriate control for expression.Advantageously, DNA molecules of the invention and/or additional DNAmolecules to be included in the same composition are carried in the sameplasmid.

[0100] Standard methods can be used in the preparation of therapeuticpolynucleotides of the invention. For example, a polynucleotide can beused in a naked form, free of any delivery vehicles, such as anionicliposomes, cationic lipids, microparticles, e.g., gold microparticles,precipitating agents, e.g., calcium phosphate, or any othertransfection-facilitating agent. In this case, the polynucleotide can besimply diluted in a physiologically acceptable solution, such as sterilesaline or sterile buffered saline, with or without a carrier. Whenpresent, the carrier preferably is isotonic, hypotonic, or weaklyhypertonic, and has a relatively low ionic strength, such as provided bya sucrose solution, e.g., a solution containing 20% sucrose.

[0101] Alternatively, a polynucleotide can be associated with agentsthat assist in cellular uptake. It can be, e.g., (i) complemented with achemical agent that modifies cellular permeability, such as bupivacaine(see, e.g., WO 94/16737), (ii) encapsulated into liposomes, or (iii)associated with cationic lipids or silica, gold, or tungstenmicroparticles.

[0102] Anionic and neutral liposomes are well-known in the art (see,e.g., Liposomes: A Practical Approach, RPC New Ed, IRL Press, 1990, fora detailed description of methods for making liposomes) and are usefulfor delivering a large range of products, including polynucleotides.

[0103] Cationic lipids can also be used for gene delivery. Such lipidsinclude, for example, Lipofectin™, which is also known as DOTMA(N-[1-(2,3-dioleyloxy)propyl]-N,N,N-trimethylammonium chloride), DOTAP(1,2-bis(oleyloxy)-3-(trimethylammonio)propane), DDAB(dimethyldioctadecylammonium bromide), DOGS (dioctadecylamidologlycylspermine), and cholesterol derivatives. A description of these cationiclipids can be found in EP 187,702, WO 90/11092, U.S. Pat. No. 5,283,185,WO 91/15501, WO 95/26356, and U.S. Pat. No. 5,527,928. Cationic lipidsfor gene delivery are preferably used in association with a neutrallipid such as DOPE (dioleyl phosphatidylethanolamine; WO 90/11092).Other transfection-facilitating compounds can be added to a formulationcontaining cationic liposomes. A number of them are described in, e.g.,WO 93/18759, WO 93/19768, WO 94/25608, and WO 95/2397. They include,e.g., spermine derivatives useful for facilitating the transport of DNAthrough the nuclear membrane (see, for example, WO 93/18759) andmembrane-permeabilizing compounds such as GALA, Gramicidine S, andcationic bile salts (see, for example, WO 93/19768).

[0104] Gold or tungsten microparticles can also be used for genedelivery, as described in WO 91/359, WO 93/17706, and by Tang et al.(Nature 356:152, 1992). In this case, the microparticle-coatedpolynucleotides can be injected via intradermal or intraepidermal routesusing a needleless injection device (“gene gun”), such as thosedescribed in U.S. Pat. No. 4,945,050, U.S. Pat. No. 5,015,580, and WO94/24263.

[0105] The amount of DNA to be used in a vaccine recipient depends,e.g., on the strength of the promoter used in the DNA construct, theimmunogenicity of the expressed gene product, the condition of themammal intended for administration (e.g., the weight, age, and generalhealth of the mammal), the mode of administration, and the type offormulation. In general, a therapeutically or prophylactically effectivedose from about 1 μg to about 1 mg, preferably, from about 10 μg toabout 800 μg, and, more preferably, from about 25 μg to about 250 μg,can be administered to human adults. The administration can be achievedin a single dose or repeated at intervals.

[0106] The route of administration can be any conventional route used inthe vaccine field. As general guidance, a polynucleotide of theinvention can be administered via a mucosal surface, e.g., an ocular,intranasal, pulmonary, oral, intestinal, rectal, vaginal, or urinarytract surface, or via a parenteral route, e.g., by an intravenous,subcutaneous, intraperitoneal, intradermal, intraepidermal, orintramuscular route. The choice of administration route will depend on,e.g., the formulation that is selected. A polynucleotide formulated inassociation with bupivacaine is advantageously administered into muscle.When a neutral or anionic liposome or a cationic lipid, such as DOTMA,is used, the formulation can be advantageously injected via intravenous,intranasal (for example, by aerosolization), intramuscular, intradermal,and subcutaneous routes. A polynucleotide in a naked form canadvantageously be administered via the intramuscular, intradermal, orsubcutaneous routes. Although not absolutely required, such acomposition can also contain an adjuvant. A systemic adjuvant that doesnot require concomitant administration in order to exhibit an adjuvanteffect is preferable.

[0107] The sequence information provided in the present applicationenables the design of specific nucleotide probes and primers that can beused in diagnostic methods. Accordingly, in a fifth aspect of theinvention, there is provided a nucleotide probe or primer having asequence found in, or derived by degeneracy of the genetic code from, asequence shown in SEQ ID NOs:1-21 (odd numbers) or a complementarysequence thereof.

[0108] The term “probe” as used in the present application refers to DNA(preferably single stranded) or RNA molecules (or modifications orcombinations thereof) that hybridize under the stringent conditions, asdefined above, to polynucleotide molecules having sequences homologousto those shown in any of SEQ ID NOs:1-21 (odd numbers), or to acomplementary or anti-sense sequence of any of SEQ ID NOs:1-21 (oddnumbers). Generally, probes are significantly shorter than thefull-length sequences shown in SEQ ID NOs:1-21 (odd numbers). Forexample, they can contain from about 5 to about 100, preferably fromabout 10 to about 80 nucleotides. In particular, probes have sequencesthat are at least 75%, preferably at least 85%, more preferably 95%homologous to a portion of a sequence as shown in SEQ ID NOs:1-21 (oddnumbers) or a sequence complementary to such sequences.

[0109] Probes can contain modified bases, such as inosine,methyl-5-deoxycytidine, deoxyuridine, dimethylamino-5-deoxyuridine, ordiamino-2, 6-purine. Sugar or phosphate residues can also be modified orsubstituted. For example, a deoxyribose residue can be replaced by apolyamide (Nielsen et al., Science 254:1497, 1991) and phosphateresidues can be replaced by ester groups such as diphosphate, alkyl,arylphosphonate, and phosphorothioate esters. In addition, the2′-hydroxyl group on ribonucleotides can be modified by addition of,e.g., alkyl groups.

[0110] Probes of the invention can be used in diagnostic tests, or ascapture or detection probes. Such capture probes can be immobilized onsolid supports, directly or indirectly, by covalent means or by passiveadsorption. A detection probe can be labeled by a detectable label, forexample a label selected from radioactive isotopes; enzymes, such asperoxidase and alkaline phosphatase; enzymes that are able to hydrolyzea chromogenic, fluorogenic, or luminescent substrate; compounds that arechromogenic, fluorogenic, or luminescent; nucleotide base analogs; andbiotin.

[0111] Probes of the invention can be used in any conventionalhybridization method, such as in dot blot methods (Maniatis et al.,Molecular Cloning: A Laboratory Manual, Cold Spring Harbor LaboratoryPress, Cold Spring Harbor, N.Y., 1982), Southern blot methods (Southern,J. Mol. Biol. 98:503, 1975), northern blot methods (identical toSouthern blot to the exception that RNA is used as a target), or asandwich method (Dunn et al., Cell 12:23, 1977). As is known in the art,the latter technique involves the use of a specific capture probe and aspecific detection probe that have nucleotide sequences that are atleast partially different from each other.

[0112] Primers used in the invention usually contain about 10 to 40nucleotides and are used to initiate enzymatic polymerization of DNA inan amplification process (e.g., PCR), an elongation process, or areverse transcription method. In a diagnostic method involving PCR, theprimers can be labeled.

[0113] Thus, the invention also encompasses (i) a reagent containing aprobe of the invention for detecting and/or identifying the presence ofHelicobacter in a biological material; (ii) a method for detectingand/or identifying the presence of Helicobacter in a biologicalmaterial, in which (a) a sample is recovered or derived from thebiological material, (b) DNA or RNA is extracted from the material anddenatured, and (c) the sample is exposed to a probe of the invention,for example, a capture probe, a detection probe, or both, understringent hybridization conditions, so that hybridization is detected;and (iii) a method for detecting and/or identifying the presence ofHelicobacter in a biological material, in which (a) a sample isrecovered or derived from the biological material, (b) DNA is extractedtherefrom, (c) the extracted DNA is contacted with at least one, or,preferably two, primers of the invention, and amplified by thepolymerase chain reaction, and (d) an amplified DNA molecule isproduced.

[0114] As mentioned above, polypeptides that can be produced byexpression of the polynucleotides of the invention can be used asvaccine antigens.

[0115] Accordingly, a sixth aspect of the invention features asubstantially purified polypeptide or polypeptide derivative having anamino acid sequence encoded by a polynucleotide of the invention.

[0116] A “substantially purified polypeptide” is defined as apolypeptide that is separated from the environment in which it naturallyoccurs and/or a polypeptide that is free of most of the otherpolypeptides that are present in the environment in which it wassynthesized. The polypeptides of the invention can be purified from anatural source, such as a Helicobacter strain, or can be produced usingrecombinant methods.

[0117] Homologous polypeptides or polypeptide derivatives encoded bypolynucleotides of the invention can be screened for specificantigenicity by testing cross-reactivity with an antiserum raisedagainst a polypeptide having an amino acid sequence as shown in any ofSEQ ID NOs:2-22 (even numbers). Briefly, a monospecific hyperimmuneantiserum can be raised against a purified reference polypeptide as suchor as a fusion polypeptide, for example, an expression product of MBP,GST, or His-tag systems, or a synthetic peptide predicted to beantigenic. The homologous polypeptide or derivative that is screened forspecific antigenicity can be produced as such or as a fusionpolypeptide. In the latter case, and if the antiserum is also raisedagainst a fusion polypeptide, two different fusion systems are employed.Specific antigenicity can be determined using a number of methods,including Western blot (Towbin et al., Proc. Natl. Acad. Sci. USA76:4350, 1979), dot blot, and ELISA methods, as described below.

[0118] In a Western blot assay, the product to be screened, either as apurified preparation or a total E. coli extract, is fractionated bySDS-PAGE, as described, for example, by Laemmli (Nature 227:680, 1970).After being transferred to a filter, such as a nitrocellulose membrane,the material is incubated with the monospecific hyperimmune antiserum,which is diluted in a range of dilutions from about 1:50 to about1:5000, preferably from about 1:100 to about 1:500. Specificantigenicity is shown once a band corresponding to the product exhibitsreactivity at any of the dilutions in the range.

[0119] In an ELISA assay, the product to be screened can be used as thecoating antigen. A purified preparation is preferred, but a whole cellextract can also be used. Briefly, about 100 μL of a preparation ofabout 10 μg protein/mL is distributed into wells of a 96-well ELISAplate. The plate is incubated for about 2 hours at 37° C., thenovernight at 4° C. The plate is washed with phosphate buffer saline(PBS) containing 0.05% Tween 20 (PBS/Tween buffer) and the wells aresaturated with 250 μL PBS containing 1% bovine serum albumin (BSA), toprevent non-specific antibody binding. After 1 hour of incubation at 37°C., the plate is washed with PBS/Tween buffer. The antiserum is seriallydiluted in PBS/Tween buffer containing 0.5% BSA, and 100 μL dilutionsare added to each well. The plate is incubated for 90 minutes at 37° C.,washed, and evaluated using standard methods. For example, a goatanti-rabbit peroxidase conjugate can be added to the wells when thespecific antibodies used were raised in rabbits. Incubation is carriedout for about 90 minutes at 37° C. and the plate is washed. The reactionis developed with the appropriate substrate and the reaction is measuredby colorimetry (absorbance measured spectrophotometrically). Under theseexperimental conditions, a positive reaction is shown once an O.D. valueof 1.0 is detected with a dilution of at least about 1:50, preferably ofat least about 1:500.

[0120] In a dot blot assay, a purified product is preferred, although awhole cell extract can be used. Briefly, a solution of the product at aconcentration of about 100 μg/mL is serially diluted two-fold with 50 mMTris-HCl (pH 7.5). One hundred μL of each dilution is applied to afilter, such as a 0.45 μm nitrocellulose membrane, set in a 96-well dotblot apparatus (Biorad). The buffer is removed by applying vacuum to thesystem. Wells are washed by addition of 50 mM Tris-HCl (pH 7.5) and themembrane is air-dried. The membrane is saturated in blocking buffer (50mM Tris-HCl (pH 7.5), 0.15 M NaCl, 10 g/L skim milk) and incubated withan antiserum diluted from about 1:50 to about 1:5000, preferably about1:500. The reaction is detected using standard methods. For example, agoat anti-rabbit peroxidase conjugate can be added to the wells whenrabbit antibodies are used. Incubation is carried out for about 90minutes at 37° C. and the blot is washed. The reaction is developed withthe appropriate substrate and stopped. The reaction is then measuredvisually by the appearance of a colored spot, e.g., by colorimetry.Under these experimental conditions, a positive reaction is associatedwith detection of a colored spot for reactions carried out with adilution of at least about 1:50, preferably, of at least about 1:500.Therapeutic or prophylactic efficacy of a polypeptide or polypeptidederivative of the invention can be evaluated as described below.

[0121] According to a seventh aspect of the invention, there is provided(i) a composition of matter containing a polypeptide of the inventiontogether with a diluent or carrier; (ii) a pharmaceutical compositioncontaining a therapeutically or prophylactically effective amount of apolypeptide of the invention; (iii) a method for inducing an immuneresponse against Helicobacter in a mammal by administering to the mammalan immunogenically effective amount of a polypeptide of the invention toelicit an immune response, e.g., a protective immune response toHelicobacter; and (iv) a method for preventing and/or treating aHelicobacter (e.g., H. pylori, H. felis, H. mustelae, or H. heilmanii)infection, by administering a prophylactic or therapeutic amount of apolypeptide of the invention to an individual in need of such treatment.Additionally, this aspect of the invention includes the use of apolypeptide of the invention in the preparation of a medicament forpreventing and/or treating Helicobacter infection.

[0122] The immunogenic compositions of the invention can be administeredby any conventional route in use in the vaccine field, for example, to amucosal (e.g., ocular, intranasal, pulmonary, oral, gastric, intestinal,rectal, vaginal, or urinary tract) surface or via a parenteral (e.g.,subcutaneous, intradermal, intramuscular, intravenous, orintraperitoneal) route. The choice of the administration route dependsupon a number of parameters, such as the adjuvant used. For example, ifa mucosal adjuvant is used, the intranasal or oral route will bepreferred, and if a lipid formulation or an aluminum compound is used, aparenteral route will be preferred. In the latter case, the subcutaneousor intramuscular route is most preferred. The choice of administrationroute can also depend upon the nature of the vaccine agent. For example,a polypeptide of the invention fused to CTB or to LTB will be bestadministered to a mucosal surface.

[0123] A composition of the invention can contain one or severalpolypeptides or derivatives of the invention. It can also contain atleast one additional Helicobacter antigen, such as the urease apoenzyme,or a subunit, fragment, homolog, mutant, or derivative thereof.

[0124] For use in a composition of the invention, a polypeptide orpolypeptide derivative can be formulated into or with liposomes, such asneutral or anionic liposomes, microspheres, ISCOMS, or virus-likeparticles (VLPs), to facilitate delivery and/or enhance the immuneresponse. These compounds are readily available to those skilled in theart; for example, see Liposomes: A Practical Approach (supra). Adjuvantsother than liposomes can also be used in the invention and are wellknown in the art (see, for example, the list provided below).

[0125] Administration can be achieved in a single dose or repeated asnecessary at intervals that can be determined by one skilled in the art.For example, a priming dose can be followed by three booster doses atweekly or monthly intervals. An appropriate dose depends on variousparameters, including the nature of the recipient (e.g., whether therecipient is an adult or an infant), the particular vaccine antigen, theroute and frequency of administration, the presence/absence or type ofadjuvant, and the desired effect (e.g., protection and/or treatment),and can be readily determined by one skilled in the art. In general, avaccine antigen of the invention can be administered mucosally in anamount ranging from about 10 μg to about 500 mg, preferably from about 1mg to about 200 mg. For a parenteral route of administration, the doseusually should not exceed about 1 mg, and is, preferably, about 100 μg.

[0126] When used as components of a vaccine, the polynucleotides andpolypeptides of the invention can be used sequentially as part of amulti-step immunization process. For example, a mammal can be initiallyprimed with a vaccine vector of the invention, such as a pox virus,e.g., via a parenteral route, and then boosted twice with a polypeptideencoded by the vaccine vector, e.g., via the mucosal route. In anotherexample, liposomes associated with a polypeptide or polypeptidederivative of the invention can be used for priming, with boosting beingcarried out mucosally using a soluble polypeptide or polypeptidederivative of the invention, in combination with a mucosal adjuvant(e.g., LT).

[0127] Polypeptides and polypeptide derivatives of the invention canalso be used as diagnostic reagents for detecting the presence ofanti-Helicobacter antibodies, e.g., in blood samples. Such polypeptidescan be about 5 to about 80, preferably, about 10 to about 50 amino acidsin length and can be labeled or unlabeled, depending upon the diagnosticmethod. Diagnostic methods involving such a reagent are described below.

[0128] Upon expression of a polynucleotide molecule of the invention, apolypeptide or polypeptide derivative is produced and can be purifiedusing known methods. For example, the polypeptide or polypeptidederivative can be produced as a fusion protein containing a fused tailthat facilitates purification. The fusion product can be used toimmunize a small mammal, e.g., a mouse or a rabbit, in order to raisemonospecific antibodies against the polypeptide or polypeptidederivative. The eighth aspect of the invention thus provides amonospecific antibody that binds to a polypeptide or polypeptidederivative of the invention.

[0129] By “monospecific antibody” is meant an antibody that is capableof reacting with a unique, naturally-occurring Helicobacter polypeptide.An antibody of the invention can be polyclonal or monoclonal.Monospecific antibodies can be recombinant, e.g., chimeric (e.g.,consisting of a variable region of murine origin and a human constantregion), humanized (e.g., a human immunoglobulin constant region and avariable region of animal, e.g., murine, origin), and/or single chain.Both polyclonal and monospecific antibodies can also be in the form ofimmunoglobulin fragments, e.g., F(ab)′2 or Fab fragments. The antibodiesof the invention can be of any isotype, e.g., IgG or IgA, and polyclonalantibodies can be of a single isotype or can contain a mixture ofisotypes.

[0130] The antibodies of the invention, which can be raised to apolypeptide or polypeptide derivative of the invention, can be producedand identified using standard immunological assays, e.g., Western blotassays, dot blot assays, or ELISA (see, e.g., Coligan et al., CurrentProtocols in Immunology, John Wiley & Sons, Inc., New York, N.Y., 1994).The antibodies can be used in diagnostic methods to detect the presenceof Helicobacter antigens in a sample, such as a biological sample. Theantibodies can also be used in affinity chromatography methods forpurifying a polypeptide or polypeptide derivative of the invention. Asis discussed further below, the antibodies can also be used inprophylactic and therapeutic passive immunization methods.

[0131] Accordingly, a ninth aspect of the invention provides (i) areagent for detecting the presence of Helicobacter in a biologicalsample that contains an antibody, polypeptide, or polypeptide derivativeof the invention; and (ii) a diagnostic method for detecting thepresence of Helicobacter in a biological sample, by contacting thebiological sample with an antibody, a polypeptide, or a polypeptidederivative of the invention, so that an immune complex is formed, anddetecting the complex as an indication of the presence of Helicobacterin the sample or the organism from which the sample was derived. Theimmune complex is formed between a component of the sample and theantibody, polypeptide, or polypeptide derivative, and that any unboundmaterial can be removed prior to detecting the complex. A polypeptidereagent can be used for detecting the presence of anti-Helicobacterantibodies in a sample, e.g., a blood sample, while an antibody of theinvention can be used for screening a sample, such as a gastric extractor biopsy sample, for the presence of Helicobacter polypeptides.

[0132] For use in diagnostic methods, the reagent (e.g., the antibody,polypeptide, or polypeptide derivative of the invention) can be in afree state or can be immobilized on a solid support, such as, forexample, on the interior surface of a tube or on the surface, or withinpores, of a bead. Immobilization can be achieved using direct orindirect means. Direct means include passive adsorption (i.e.,non-covalent binding) or covalent binding between the support and thereagent. By “indirect means” is meant that an anti-reagent compound thatinteracts with the reagent is first attached to the solid support. Forexample, if a polypeptide reagent is used, an antibody that binds to itcan serve as an anti-reagent, provided that it binds to an epitope thatis not involved in recognition of antibodies in biological samples.Indirect means can also employ a ligand-receptor system, for example, amolecule, such as a vitamin, can be grafted onto the polypeptide reagentand the corresponding receptor can be immobilized on the solid phase.This concept is illustrated by the well known biotin-streptavidinsystem. Alternatively, indirect means can be used, e.g., by adding tothe reagent a peptide tail, chemically or by genetic engineering, andimmobilizing the grafted or fused product by passive adsorption orcovalent linkage of the peptide tail.

[0133] According to a tenth aspect of the invention, there is provided aprocess for purifying, from a biological sample, a polypeptide orpolypeptide derivative of the invention, which involves carrying outantibody-based affinity chromatography with the biological sample,wherein the antibody is a monospecific antibody of the invention.

[0134] For use in a purification process of the invention, the antibodycan be polyclonal or monospecific, and preferably is of the IgG type.Purified IgGs can be prepared from an antiserum using standard methods(see, e.g., Coligan et al., supra). Conventional chromatographysupports, as well as standard methods for grafting antibodies, aredescribed, for example, by Harlow et al. (Antibodies: A LaboratoryManual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.,1988).

[0135] Briefly, a biological sample, such as an H. pylori extract,preferably in a buffer solution, is applied to a chromatographymaterial, which is, preferably, equilibrated with the buffer used todilute the biological sample, so that the polypeptide or polypeptidederivative of the invention (i.e., the antigen) is allowed to adsorbonto the material. The chromatography material, such as a gel or a resincoupled to an antibody of the invention, can be in batch form or in acolumn. The unbound components are washed off and the antigen is elutedwith an appropriate elution buffer, such as a glycine buffer, a buffercontaining a chaotropic agent, e.g., guanidine HCl, or a buffer havinghigh salt concentration (e.g., 3 M MgCl₂). Eluted fractions arerecovered and the presence of the antigen is detected, e.g., bymeasuring the absorbance at 280 nm.

[0136] An antibody of the invention can be screened for therapeuticefficacy as follows. According to an eleventh aspect of the invention,there is provided (i) a composition of matter containing a monospecificantibody of the invention, together with a diluent or carrier; (ii) apharmaceutical composition containing a therapeutically orprophylactically effective amount of a monospecific antibody of theinvention, and (iii) a method for treating or preventing Helicobacter(e.g., H. pylori, H. felis, H. mustelae, or H. heilmanii) infection, byadministering a therapeutic or prophylactic amount of a monospecificantibody of the invention to an individual in need of such treatment. Inaddition, the eleventh aspect of the invention includes the use of amonospecific antibody of the invention in the preparation of amedicament for treating or preventing Helicobacter infection.

[0137] The monospecific antibody can be polyclonal or monoclonal, andis, preferably, predominantly of the IgA isotype. In passiveimmunization methods, the antibody is administered to a mucosal surfaceof a mammal, e.g., the gastric mucosa, e.g., orally or intragastrically,optionally, in the presence of a bicarbonate buffer. Alternatively,systemic administration, not requiring a bicarbonate buffer, can becarried out. A monospecific antibody of the invention can beadministered as a single active agent or as a mixture with at least oneadditional monospecific antibody specific for a different Helicobacterpolypeptide. The amount of antibody and the particular regimen used canbe readily determined by one skilled in the art. For example, dailyadministration of about 100 to 1,000 mg of antibody over one week, orthree doses per day of about 100 to 1,000 mg of antibody over two orthree days, can be effective regimens for most purposes.

[0138] Therapeutic or prophylactic efficacy can be evaluated usingstandard methods in the art, e.g., by measuring induction of a mucosalimmune response or induction of protective and/or therapeutic immunity,using, e.g., the H. felis mouse model and the procedures described byLee et al. (Eur. J. Gastroenterology & Hepatology 7:303, 1995) or Lee etal. (J. Infect. Dis. 172:161, 1995). Those skilled in the art willrecognize that the H. felis strain of the model can be replaced withanother Helicobacter strain. For example, the efficacy of polynucleotidemolecules and polypeptides from H. pylori is, preferably, evaluated in amouse model using an H. pylori strain. Protection can be determined bycomparing the degree of Helicobacter infection in the gastric tissueassessed by, for example, urease activity, bacterial counts, orgastritis, to that of a control group. Protection is shown wheninfection is reduced by comparison to the control group. Such anevaluation can be made for polynucleotides, vaccine vectors,polypeptides, and polypeptide derivatives, as well as for antibodies ofthe invention.

[0139] For example, various doses of an antibody of the invention can beadministered to the gastric mucosa of mice previously challenged with anH. pylori strain, as described, e.g., by Lee et al. (supra). Then, afteran appropriate period of time, the bacterial load of the mucosa can beestimated by assessing urease activity, as compared to a control.Reduced urease activity indicates that the antibody is therapeuticallyeffective.

[0140] Adjuvants that can be used in any of the vaccine compositionsdescribed above are described as follows. Adjuvants for parenteraladministration include, for example, aluminum compounds, such asaluminum hydroxide, aluminum phosphate, and aluminum hydroxy phosphate.The antigen can be precipitated with, or adsorbed onto, the aluminumcompound using standard methods. Other adjuvants, such as RIBI(ImmunoChem, Hamilton, Mont.), can also be used in parenteraladministration.

[0141] Adjuvants that can be used for mucosal administration include,for example, bacterial toxins, e.g., the cholera toxin (CT), the E. coliheat-labile toxin (LT), the Clostridium difficile toxin A, the pertussistoxin (PT), and combinations, subunits, toxoids, or mutants thereof. Forexample, a purified preparation of native cholera toxin subunit B (CTB)can be used. Fragments, homologs, derivatives, and fusions to any ofthese toxins can also be used, provided that they retain adjuvantactivity. Preferably, a mutant having reduced toxicity is used. Suitablemutants are described, e.g., in WO 95/17211 (Arg-7-Lys CT mutant), WO96/6627 (Arg-192-Gly LT mutant), and WO 95/34323 (Arg-9-Lys andGlu-129-Gly PT mutant). Additional LT mutants that can be used in themethods and compositions of the invention include, e.g., Ser-63-Lys,Ala-69-Gly, Glu-110-Asp, and Glu-112-Asp mutants. Other adjuvants, suchas the bacterial monophosphoryl lipid A (MPLA) of, e.g., E. coli,Salmonella Minnesota, Salmonella typhimurium, or Shigella flexneri;saponins, and polylactide glycolide (PLGA) microspheres, can also beused in mucosal administration. Adjuvants useful for both mucosal andparenteral administrations, such as polyphosphazene (WO 95/2415), canalso be used.

[0142] Any pharmaceutical composition of the invention, containing apolynucleotide, polypeptide, polypeptide derivative, or antibody of theinvention, can be manufactured using standard methods. It can beformulated with a pharmaceutically acceptable diluent or carrier, e.g.,water or a saline solution, such as phosphate buffer saline, optionally,including a bicarbonate salt, such as sodium bicarbonate, e.g., 0.1 to0.5 M. Bicarbonate can advantageously be added to compositions intendedfor oral or intragastric administration. In general, a diluent orcarrier can be selected on the basis of the mode and route ofadministration, and standard pharmaceutical practice. Suitablepharmaceutical carriers and diluents, as well as pharmaceuticalnecessities for their use in pharmaceutical formulations, are describedin Remington's Pharmaceutical Sciences, a standard reference text inthis field and in the USP/NF.

[0143] The invention also includes methods in which gastroduodenalinfections, such as Helicobacter infection, are treated by oraladministration of a Helicobacter polypeptide of the invention and amucosal adjuvant, in combination with an antibiotic, an antisecretoryagent, a bismuth salt, an antacid, sucralfate, or a combination thereof.Examples of such compounds that can be administered with the vaccineantigen and an adjuvant are antibiotics, including, e.g., macrolides,tetracyclines, β-lactams, aminoglycosides, quinolones, penicillins, andderivatives thereof (specific examples of antibiotics that can be usedin the invention include, e.g., amoxicillin, clarithromycin,tetracycline, metronidizole, erythromycin, cefuroxime, anderythromycin); antisecretory agents, including, e.g., H₂-receptorantagonists (e.g., cimetidine, ranitidine, famotidine, nizatidine, androxatidine), proton pump inhibitors (e.g., omeprazole, lansoprazole, andpantoprazole), prostaglandin analogs (e.g., misoprostil and enprostil),and anticholinergic agents (e.g., pirenzepine, telenzepine,carbenoxolone, and proglumide); and bismuth salts, including colloidalbismuth subcitrate, tripotassium dicitrate bismuthate, bismuthsubsalicylate, bicitropeptide, and pepto-bismol (see, e.g., Goodwin etal., Helicobacter pylori, Biology and Clinical Practice, CRC Press, BocaRaton, Fla., pp 366-395, 1993; Physicians' Desk Reference, 49^(th) edn.,Medical Economics Data Production Company, Montvale, N.J., 1995). Inaddition, compounds containing more than one of the above-listedcomponents coupled together, e.g., ranitidine coupled to bismuthsubcitrate, can be used. The invention also includes compositions forcarrying out these methods, i.e., compositions containing a Helicobacterantigen (or antigens) of the invention, an adjuvant, and one or more ofthe above-listed compounds, in a pharmaceutically acceptable carrier ordiluent.

[0144] Amounts of the above-listed compounds used in the methods andcompositions of the invention can readily be determined by one skilledin the art. In addition, one skilled in the art can readily designtreatment/immunization schedules. For example, the non-vaccinecomponents can be administered on days 1-14, and the vaccineantigen+adjuvant can be administered on days 7, 14, 21, and 28.

[0145] Methods and pharmaceutical compositions of the invention can beused to treat or to prevent Helicobacter infections and, accordingly,gastroduodenal diseases associated with these infections, includingacute, chronic, and atrophic gastritis, and peptic ulcer diseases, e.g.,gastric and duodenal ulcers.

[0146] A 76 kDa protein band containing GHPO 386, GHPO 789, and GHPO1516 (hereinafter the “purified 76 kDa proteins”) was purified fromHelicobacter pylori strain ATCC number 43579 (American Type CultureCollection, Rockville, Md.) by immunoaffinity-based chromatography usingthe methods described below in Example 1, and the purified 76 kDaproteins were shown to be effective vaccine antigens as follows.

[0147] Groups of 10 mice each were orally immunized with 1, 5, or 25 μgof the purified 76 kDa proteins, in combination with 5 μg of theheat-labile enterotoxin (LT) of E. coli. Twenty five μg of recombinanturease, in combination with 5 μg LT, was used as a positive control, and5 μg of LT in PBS was used as a negative control. The immunizations werecarried out four times each, on days 0, 7, 14, and 21 of the experiment.On day 33, blood samples were collected from the mice and, on day 34,saliva samples were collected. On day 35, all of the mice werechallenged by intragastric administration of 1×10⁷streptomycin-resistant, mouse-adapted H. pylori. On day 49, additionalsaliva samples were collected and, about two weeks after challenge, ondays 52-53, the mice were sacrificed. Stomachs were removed from themice and were analyzed for Helicobacter infection by measuring ureaseactivity in the intact stomach tissue and by a quantitative culturestudy (Table 1).

[0148] Briefly, these studies showed that the gastric urease activitiesin samples from mice immunized with all three amounts of the purified 76kDa proteins (i.e., 1, 5, and 25 μg), in combination with LT, weregenerally lower than the gastric urease activities of samples from miceimmunized with LT alone or mice that were not treated prior tochallenge. Levels of gastric urease activity generally decreased withincreasing amounts of the purified 76 kDa proteins administered, withthe gastric urease activity levels for the 25 μg doses of the purified76 kDa proteins generally approaching those of mice immunized with 25 μgof recombinant urease and LT.

[0149] The quantitative culture analyses showed that the levels ofHelicobacter detected in the stomachs of mice immunized with thepurified 76 kDa proteins, which generally decreased with increasingdosages, were less than the levels detected in the stomachs of controlmice that were immunized with LT alone or untreated before Helicobacterchallenge (Table 1). The percentages of mice protected by immunizationwith the purified 76 kDa proteins approached the percentages of miceprotected by treatment with urease (Table 1). These results show thatthe purified 76 kDa proteins are effective vaccine antigens for use inpreventing Helicobacter infection. TABLE 2 Prophylactic immunizationwith PMsv antigens as an oral dose response against an H. pylorichallenge BALB/c mice Fisher's exact test # mice infected infectionstatus ANTRUM based on quant. A550 ratios Wilcoxon rank sums test basedon quant. A550 Tx. gr. vs. LT only (gr.11) CFU/ml (¼ antrum) CFU Tx. gr.vs. LT control (gr.11) Tx 0.148 O.D. cutoff p -value MEAN ± SD p -value1 μg 76 kDa + LT 56% 0.1409 39922 ± 34708 0.2203 (5/9) 5 μg 76 kDa + LT80% 1 8802 ± 7788 0.0864 (4/5) 25 μg 76 kDa + LT 33% 0.0198  9712 ±12183 0.0178 (3/9) 25 μg rUre + LT  0 0.0001 8208 ± 8021 0.0179 (0/9) LT90% 107340 ± 127949  (9/10) 90% not determined 46173 ± 42325 0.2568 (9/10)

[0150] The invention is further illustrated by the following examples.Example 1 describes purification GHPO 1516, which is a 76 kDa protein,from Helicobacter cultures. Example 2 describes identification of genes,e.g., genes encoding 76 kDa proteins such as GHPO 386, GHPO 789, GHPO1516, GHPO 1197, GHPO 1180, GHPO 896, GHPO 711, GHPO 190, GHPO 185, GHPO1417, and GHPO 1414, in the Helicobacter genome, as well asidentification of leader sequences, and primer design for amplificationof genes lacking signal sequences. Example 3 describes cloning of DNAencoding GHPO 386, GHPO 789, GHPO 1516, and GHPO 896 into a vector thatprovides a histidine tag, and production and 10 purification of theresulting his-tagged fusion proteins. Example 4 describes methods forcloning DNA encoding the polypeptides of the invention so that they canbe produced without His-tags, and Example 5 describes methods forpurifying recombinant polypeptides of the invention.

EXAMPLE 1

[0151] Purification and Partial Sequence Analysis of GHPO 1516, a 76 kDaProtein, from Helicobacter pylori

[0152] 1.A. Culture and Initial Purification Steps

[0153] Frozen seeds from H. pylori strain ATCC 43579 are used to seed a75 cm² flask containing a biphasic medium (a solid phase made ofColombia gelose containing 6% fresh sheep blood and a liquid phase madeof triptcase soja containing 20% fetal calf serum). After 24 hours ofculturing under microaerophilic conditions, the liquid phase is used forseeding several 75 cm² flasks containing biphasic medium lacking sheepblood. After 24 hours of culture, the liquid phase is used to seed a 2 Lbiofermentor in triptcase soja liquid phase containing 10 g/Lbeta-cyclodextrine. At OD 1.5-1.8, this culture is diluted in a 10 Lbiofermentor containing the liquid medium. After 24 hours, the bacteriaare centrifuged at 4,000× g for 30 minutes at 4° C. A 10 L culturecontains about 20 to 30 g (wet weight) bacteria.

[0154] The pellet obtained using the method described above is washedwith 500 mL Phosphate Buffer Saline (PBS: 7.650 g NaCl, 0.724 g disodiumphosphate, and 0.210 g monopotassium phosphate for one liter (pH 7.2))for a one liter culture. The bacteria are then centrifuged again underthe same conditions.

[0155] The pellet (C1) is suspended in 1% N-octyl-D-glucopyranoside(NOG; 30 mL/L; Sigma). The bacterial suspension is incubated for 1 hourat room temperature under stirring, centrifuged at 17,600× g for 30minutes at 4° C., and the pellet (C2) is recovered.

[0156] The supernatant (S2) is dialyzed against PBS overnight at 4° C.under stirring. The precipitate is recovered by centrifugation at 2,600×g for 30 minutes at 4° C. The supernatant (S2d) is discarded and thepellet (Cs2d) is recovered and stored at −20° C.

[0157] The pellet (C2) is resuspended in 20 mM Tris-HCl buffer (pH 7.5)and 100 FM Pefabloc (Buffer A), and is homogenized with an ultra-turrax(3821, Janke and Kungel). Lysozyme and EDTA are added at 0.1 mg/mL and 1mM, respectively.

[0158] The homogenate is sonicated three times for 2 minutes each at 4°C., and then is spun in an ultracentrifuge at 210,000× g for 30 minutesat 4° C. The supernatant (S3), which contains the cytoplasmic andperiplasmic proteins, is eliminated, while the pellet is recovered,washed with buffer A, and spun in an ultracentrifuge at 210,000× g for30 minutes at 4° C. The supernatant (S4) is eliminated and the pellet(C4) is stored at −20° C. This pellet (C4) contains membrane proteins.

[0159] The pellet is washed in 50 mM NaCO₃ (pH 9.5) and 100 μM Pefabloc(buffer B). The suspension is spun in an ultracentrifuge at 210,000× gfor 30 minutes at 4° C. The supernatant (S5) is eliminated, and thepellet (C5) is then washed and spun in an ultracentrifuge as isdescribed above. The supernatant (S6) is eliminated and the pellet (C6)is stored at −20° C.

[0160] 1.B. Purification of the Proteins of Membrane Fraction C4 byPreparative SDS-PAGE

[0161] SDS-PAGE is carried out according to the method of Laemmli(supra), using a biphasic gel consisting of a 5% polyacrylamideconcentrating gel and a 10% polyacrylamide separating gel. The membranefraction C4 is resuspended in buffer A, diluted in an equal volume of 2×sample buffer, and heated for 5 minutes at 95° C. About 19 mg of proteinis applied to the gel (16×12 cm; thickness: 5 mm). Pre-migration iscarried out for 2 hours at 50 V, and is followed by migration overnightat 65 V. After Coomassie blue staining, five major bands are revealedthat have apparent molecular weights of 87, 76, 54, 50, and 32 kD. Bandsat 50 and 32 kDa appear to be slightly contaminated with bands at 47 and35 kD, respectively.

[0162] A band corresponding to the purified 76 kDa proteins is cut outfrom the gel and is pounded with an ultra-turrax in 10-20 ML extractionbuffer (25 mM Tris-HCl (pH 8.8), 8 M urea, 10% SDS, 100 μM phenyl methylsulfonyl fluoride (PMSF), and 10 μM Pefabloc (buffer C)).

[0163] Each homogenate is filtered through a Millipore AP20 filter under7 bars at room temperature, washed with 5-10 mL buffer C, and thenfiltered again. Each filtrate is precipitated with three volumes of a50/50 mixture of 75% methanol and 75% isopropanol, and then iscentrifuged at 240,000× g for 16 hours at 10° C.

[0164] Each pellet is resuspended in 2 mL of 10 mM NaPO₄ (pH 7.0)containing 1 M NaCl, 0.1% Sarkosyl, 100 ptM PMSF, and 6 M urea (bufferD).

[0165] The solubilized sample is dialyzed, in order, against 100 mLbuffer D containing 4 M urea, 100 mL buffer D containing 2 M urea and0.5% Sarkosyl, and twice against 100 mL buffer D that does not containurea or Sarkosyl. The dialyses are carried out for 1 hour each understirring at room temperature. The last dialysate is incubated for 30minutes in an ice bath, and then is centrifuged at low speed for 10minutes at 4° C. The supernatant is recovered, filtered through aMillipore filter (0.45 em), and stored at −20° C.

[0166] 1.C. Purification of the 76 kDa Protein by Immunoaffinity-basedChromatography

[0167] 1.C.1. Antiserum Preparation

[0168] Specific polyclonal serum against the purified 76 kDa proteins,which are purified by preparative SDS-PAGE, is prepared byhyperimmunizing rabbits as follows. On day 0, a preparation containing50 fig of the protein mixed with complete Freund's adjuvant isadministered subcutaneously to the rabbits at multiple sites. Therabbits are boosted at days 21 and 42 with 25 μg of the protein inincomplete Freund's adjuvant, and are sacrificed at day 60. Complementis removed from the serum by heating for 30 minutes at 56° C. Thehyperimmune serum is then sterilized by filtration through a Milliporemembrane (0.22 μm).

[0169] 1.C.2. IgG Purification

[0170] The hyperimmune serum prepared as described above is applied to aProtein A Sepharose Fast Flow column (Pharmacia) that is equilibratedwith 100 mM Tris-HCl (pH 8.0). The column is washed with 10 columnvolumes of 100 mM Tris-HCl (pH 8.0), and then with 10 column volumes of10 mM Tris-HCl (pH 8.0). IgGs are eluted in 0.1 M glycine buffer (pH3.0), and are collected as 5 mL fractions, to each of which 0.25 mL ofTris-HCl (pH 8.0) is added. The optical density is measured at 280 nm,the IgG-containing fractions are pooled together, and, if necessary,frozen at −70° C.

[0171] 1.C.3. Preparation of the Column

[0172] An appropriate amount of CNBr-activated Sepharose 4B gel(Pharmacia; reference: 17-0430-01) is suspended in 1 mM NaCl buffer (1 gdry gel provides for 3.5 mL hydrated gel; 5 to 10 mg IgGs can beretained per mL of hydrated gel). The gel is then washed with a buchnerby adding small quantities of 1 mM HCl. The total volume of 1 mM HClthat is used amounts to 200 mL/g of gel.

[0173] Purified IgGs are dialyzed for 4 hours at room temperatureagainst 50 volumes of 500 mM sodium phosphate buffer (pH 7.5). The IgGsare then diluted to 3 mg/mL with the same buffer. IgGs are incubatedwith the gel overnight at 5±3° C. under stirring. The gel is packed in achromatography column and is washed with 2 column volumes of 500 mMphosphate buffer (pH 7.5). The gel is then transferred to a tube and isincubated with 100 mM ethanolamine (pH 7.5), and then it is washed with2 column volumes of PBS. The gel can be stored in PBS/merthiolate,1/10,000.

[0174]1.C.4. Adsorption and Elution

[0175] The membrane fraction Cs2d is suspended in 50 mM Tris-HCl (pH8.0), 2 mM EDTA, and then is filtered through a 0.45 μm membrane. Thesupernatant is applied to the column, which is equilibrated with 50 mMTris-HCl (pH 8.0), 2 mM EDTA, at a flow rate of about 10 mL/hour. Thecolumn is washed with 20 column volumes of 50 mM Tris-HCl (pH 8.0), 2 mMEDTA, and then with 2 to 6 volumes 10 mM phosphate buffer (pH 6.8).

[0176] The antigen is eluted with 100 mM glycine buffer (pH 2.5). Theeluate is collected in 3 mL fractions, to each of which is added 150 μL1 M phosphate buffer (pH 8.0). The optical density of each fraction ismeasured at 280 nm, fractions containing the 76 kDa protein are pooled,and stored at −70° C.

[0177] Analysis by 10% SDS-PAGE reveals a single band at 76 kD.N-terminal sequence was carried out on this purified 76 kDa preparation,and the sequence obtained is as follows: EDDGFYTSVGYQIGEAAQMV (SEQ IDNO:58).

[0178] 1.D. Purification of the 76 kDa Protein from Membrane FractionCs2d

[0179] The 76 kDa protein can also be purified as follows. A 40 mLQ-Sepharose column (diameter: 2.5 cm; height: 8 cm) is preparedaccording to the manufacturer's instructions (Pharmacia). The column iswashed and equilibrated with buffer B, containing 50 mM NaCO₃ (pH 9.5),100 μM Pefabloc, and 0.1% Zwiftergent 3-14. The chromatography ismonitored by measuring absorbance at 280 nm at the column exit.

[0180] One hundred and forty mg of protein from the membrane fractionCs2d resuspended in buffer B are applied to the column. The column iswashed with 0.1 M NaCl in buffer B, and then a 0.1-0.5 M NaCl gradientis applied to the column. The fraction eluted between 0.35 and 0.45 MNaCl is further purified on a 10 mL S-Sepharose column (diameter: 1.5cm; height: 5 cm; up to 10 mg protein/mL of gel), which is preparedaccording to the manufacturer's instructions (Pharmacia). The fractionobtained is dialyzed against 50 mM acetate (pH 5.0) containing 100 μMPefabloc and 0.1% Zwittergent 3-14, and then is applied to the column,which is equilibrated with the acetate buffer.

[0181] The column is washed with the acetate buffer until the absorbanceat 280 nm is stabilized (about 3 column volumes are required). Proteinsare eluted with a 0-0.5 M NaCl gradient in acetate buffer. The fractioneluted at 0.15 M NaCl is enriched with the 76 kDa protein.

EXAMPLE 2

[0182] Identification of Genes in the H. pylori Genome, Such as GenesEncoding the 76 kDa Proteins, Identification of Leader Sequences, andPrimer Design for Amplification of Genes Lacking Signal Sequences

[0183] 2.A. Creating H. pylori Genomic Databases

[0184] The H. pylori genome was provided as a text file containing asingle contiguous string of nucleotides that had been determined to be1.76 Megabases in length. The complete genome was split into 17 separatefiles using the program SPLIT (Creativity in Action), giving rise to 16contigs, each containing 100,000 nucleotides, and a 17^(th) contigcontaining the remaining 76,000 nucleotides. A header was added to eachof the 17 files using the format: >hpg0.txt (representing contig 1),.hpgl.txt (representing contig 2), etc. The resulting 17 files, namedhpg0 through hpg16, were then copied together to form one file thatrepresented the plus strand of the complete H. pylori genome. Theconstructed database was given the designation “H.” A negative stranddatabase of the H. pylori genome was created similarly by first creatinga reverse complement of the positive strand using the program SeqPup (D.G. Gilbert, Indiana University Biology Department) and then performingthe same procedure as described above for the plus strand. This databasewas given the designation “N.”

[0185] The regions predicted to encode open reading frames (ORFs) weredefined for the complete H. pylori genome using the program GENEMARK™(Borodovsky et al., Comp. Chem. 17:123, 1993). A database was createdfrom a text file containing an annotated version of all ORFs predictedto be encoded by the H. pylori genome for both the plus and minusstrands, and was given the designation “O.” Each ORF was assigned anumber indicating its location on the genome and its position relativeto other genes. No manipulation of the text file was required.

[0186] 2.B. Searching the H. pylori Databases

[0187] The databases constructed as is described above were searchedusing the program FASTA (Pearson et al., Proc. Natl. Acad. Sci. USA85:2444-2448, 1988). FASTA was used for searching either a DNA sequenceagainst either of the gene databases (“H” and/or “N”), or a peptidesequence against the ORF library (“O”). TFASTX was used to search apeptide sequence against all possible reading frames of a DNA database(“H” and/or “N” libraries). Potential frameshifts also being resolved,FASTX was used for searching the translated reading frames of a DNAsequence against either a DNA database, or a peptide sequence againstthe protein database.

[0188] 2.C. Isolation of DNA Sequences from the H. pylori Genome

[0189] The FASTA searches against the constructed DNA databasesidentified exact nucleotide coordinates on one or more of the isolatedcontigs, and therefore the location of the target DNA. Once the exactlocation of the target sequence was known, the contig identified tocarry the gene was exported into the software package MapDraw (DNAStar,Inc.) and the gene was isolated. Gene sequences with flanking DNA wasthen excised and copied into the EditSeq. Software package (DNAStar,Inc.) for further analysis.

[0190] 2.D. Identification of Leader Sequences

[0191] The deduced protein encoded by a target gene sequence is analyzedusing the PROTEAN software package (DNAStar, Inc.). This analysispredicts those areas of the protein that are hydrophobic by using theKyte-Dooliffle algorithm, and identifies any potential polar residuespreceding the hydrophobic core region, which is typical for many leadersequences. For confirmation, the target protein is then searched againsta PROSITE database (DNAStar, Inc.) consisting of motifs and signatures.Characteristic of many leader sequences and hydrophobic regions ingeneral, is the identification of predicted prokaryotic lipid attachmentsites. Where confirmation between the two approaches is apparent at theN-terminus of any protein, putative cleavage sites are sought.Specifically, this includes the presence of either an Alanine (A),Serine (S), or Glycine (G) residue immediately after the corehydrophobic region. In the case of lipoproteins, a Cysteine (C) residuewould be identified as the +1 residue, post-cleavage.

[0192] 2.E. Rational Design of PCR Primers Based on the Identificationof Leader Sequences

[0193] In order to clone gene sequences as N-terminus translationalfusions for the generation of recombinant proteins with N-terminalHistidine tags, the gene sequence that specifies the leader sequence isomitted. The 5′-end of the gene-specific portion of the N-terminalprimer is designed to start at the first codon beyond the cleavage site.In the case of lipoproteins, the 5′-end of the N-terminal primer beginsat the second codon, immediately after the modifiable residue atposition +1 post-cleavage. The omission of the leader sequence from therecombinant allows for one-step purification, and potential problemsassociated with insertion of leader sequences in the membrane of thehost strain carrying the hybrid construct are avoided.

EXAMPLE 3

[0194] Preparation of Isolated DNA Encoding GHPO 386, GHPO 789, GHPO1516, and GHPO 896 and Production of these Proteins as aHistidine-tagged Fusion Proteins

[0195] 3.A. Preparation of Genomic DNA from Helicobacter pylori

[0196]Helicobacter pylori strain ORV2001, stored in LB medium containing50% glycerol at −70° C., is grown on Colombia agar containing 7% sheepblood for 48 hours under microaerophilic conditions (8-10% CO₂, 5-7% O₂,85-87% N₂). Cells are harvested, washed with phosphate buffer saline(PBS) (pH 7.2), and DNA is then extracted from the cells using the RapidPrep Genomic DNA Isolation kit (Pharmacia Biotech).

[0197] 3.B. PCR Amplification

[0198] DNA encoding GHPO 386, GHPO 789, GHPO 1516, and GHPO 896 isamplified from genomic DNA, as can be prepared as is described above, bythe Polymerase Chain Reaction (PCR) using the following primers: GHPO386:       N-terminal primer: 5′-CTGAATTCGATTTCAAGGAGAAAACATGAAA-3′ (SEQID NO:59); and       C-terminal primer:5′-CCGCTCGAGTTAGTAAGCGAACACATAATT-3′ (SEQ ID NO:60). GHPO 789:      N-terminal primer: 5′-CGCGGATCCGAATCCAATTTAATCCAAAAAGG-3′ (SEQ IDNO:61); and       C-terminal primer:5′-CCGCTCGAGTTAGTAAGCGAACACATAGTTCAA-3′ (SEQ ID NO:62). GHPO 1516:      N-terminal primer: 5′-CGCGGATCCGAATCCAATTTAATCCAAAAAGG-3′ (SEQ IDNO:56); and       C-terminal primer:5′-CCGCTCGAGTTAAGTAAGCGAACACATATTCAA-3′ (SEQ ID NO: 57). GHPO 896:      N-terminal primer: 5′-CGCGGATCCGAAGTTTCTTTGTATCAAAG-3′ (SEQ IDNO:63); and       C-terminal primer:5′-CCGCTCGAGTTAGTAAGCAAACACATAATTGTG-3′ (SEQ ID NO:64).

[0199] The N-terminal and C-terminal primers each include a 5′ clamp anda restriction enzyme recognition sequence for cloning purposes (BamHI(GGATCC) and XhoI (CTCGAG) recognition sequences). Amplification ofgene-specific DNA is carried out using a heat-stable DNA Polymeraseaccording to the manufacturer's instructions. The reaction mixture,which is brought to a final volume of 100 μL with distilled water, is asfollows: dNTPs mix 200 μM 10x ThermoPol buffer 10 μL primers 300 nM eachDNA template 50 ng DNA polymerase 2 units

[0200] Appropriate amplification reaction conditions can readily bedetermined by one skilled in the art. In the present case, the followingconditions were used. For GHPO 386 and GHPO 789, in a reactioncontaining Taq DNA polymerase (Appligene), a denaturing step was carriedout at 95° C. for 30 seconds, followed by an annealing step at 50° C.for one minute, and an extension step at 72° C. for 2 minutes and 30seconds. Twenty five cycles were carried out. For GHPO 896, in areaction containing Taq DNA polymerase, a denaturing step was carriedout at 97° C. for 30 seconds, followed by an annealing step at 50° C.for one minute, and an extension step at 72° C. for 2 minutes and 30seconds. Twenty five cycles were carried out. The same reactionconditions were used for GHPO 1516 as GHPO 896, except that Vent DNApolymerase was used for clone GHPO 1516, instead of Taq DNA polymerase,and the annealing temperature was 55° C.

[0201] 3.C. Transformation and Selection of Transformants

[0202] A single PCR product is thus amplified and is then digested at37° C. for 2 hours with BamHI and XhoI concurrently in a 20 μL reactionvolume. The digested product is ligated to similarly cleaved pET28a(Novagen) that is dephosphorylated prior to the ligation by treatmentwith Calf Intestinal Alkaline Phosphatase (CIP). The gene fusionconstructed in this manner allows one-step affinity purification of theresulting fusion protein because of the presence of histidine residuesat the N-terminus of the fusion protein, which are encoded by thevector.

[0203] The ligation reaction (20 μL) is carried out at 14° C. overnightand then is used to transform 100 μL fresh E. coli XL 1-blue competentcells (Novagen). The cells are incubated on ice for 2 hours, thenheat-shocked at 42° C. for 30 seconds, and returned to ice for 90seconds. The samples are then added to 1 mL LB broth in the absence ofselection and grown at 37° C. for 2 hours. The cells are then plated outon LB agar containing kanamycin (50 μg/mL) at a 10× and neat dilutionand incubated overnight at 37° C. The following day, 50 colonies arepicked onto secondary plates and incubated at 37° C. overnight.

[0204] Five colonies are picked into 3 mL LB broth supplemented withkanamycin (100 μg/mL) and are grown overnight at 37° C. Plasmid DNA isextracted using the Quiagen mini-prep. method and is quantitated byagarose gel electrophoresis.

[0205] PCR is performed with the gene-specific primers under theconditions stated above and transformant DNA is confirmed to contain thedesired insert. If PCR-positive, one of the five plasmid DNA samples(500 ng) extracted from the E. coli XL1-blue cells is used to transformcompetent BL21 (ADE3) E. coli competent cells (Novagen; as describedpreviously). Transformants (10) are picked onto selective kanamycin (50μg/mL) containing LB agar plates and stored as a research stock in LBcontaining 50% glycerol.

[0206] 3.D. Purification of Recombinant Proteins

[0207] One mL of frozen glycerol stock prepared as described in 3.C. isused to inoculate 50 mL of LB medium containing 25 μg/mL of kanamycin ina 250 mL Erlenmeyer flask. The flask is incubated at 37° C. for 2 hoursor until the absorbance at 600 nm (OD₆₀₀) reaches 0.4-1.0. The cultureis stopped from growing by placing the flask at 4° C. overnight. Thefollowing day, 10 mL of the overnight culture are used to inoculate 240mL LB medium containing kanamycin (25 μg/mL), with the initial OD₆₀₀about 0.02-0.04. Four flasks are inoculated for each ORF.

[0208] The cells are grown to an OD₆₀₀ of 1.0 (about 2 hours at 37° C.),an 1 mL sample is harvested by centrifugation, and the sample isanalyzed by SDS-PAGE to detect any leaky expression. The remainingculture is induced with 1 mM IPTG and the induced cultures are grown foran additional 2 hours at 37° C.

[0209] The final OD₆₀₀ is taken and the cells are harvested bycentrifugation at 5,000× g for 15 minutes at 4° C. The supernatant isdiscarded and the pellets are resuspended in 50 mM Tris-HCl (pH 8.0), 2mM EDTA. Two hundred and fifty mL of buffer are used for a 1 L cultureand the cells are recovered by centrifugation at 12,000× g for 20minutes. The supernatant is discarded and the pellets are stored at −45°C.

[0210] 3. E. Protein Purification

[0211] Pellets obtained from 3.D. are thawed and resuspended in 95 mL of50 mM Tris-HCl (pH 8.0). Pefabloc and lysozyme are added to finalconcentrations of 100 μM and 100 μg/mL, respectively. The mixture ishomogenized with magnetic stirring at 5° C. for 30 minutes. Benzonase(Merck) is added at a 1 U/mL final concentration, in the presence of 10mM MgCl₂, to ensure total digestion of the DNA. The suspension issonicated (Branson Sonifier 450) for 3 cycles of 2 minutes each atmaximum output. The homogenate is centrifuged at 19,000× g for 15minutes and both the supernatant and the pellet are analyzed by SDS-PAGEto detect the cellular location of the target protein in the soluble orinsoluble fractions, as is described further below.

[0212] 3.E.1. Soluble Fraction

[0213] If the target protein is produced in a soluble form (i.e., in thesupernatant obtained in 3.E.) NaCl and imidazole are added to thesupernatant to final concentrations of 50 mM Tris-HCl (pH 8.0), 0.5 MNaCl, and 10 mM imidazole (buffer A). The mixture is filtered through a0.45 μm membrane and loaded onto an IMAC column (Pharmacia HiTrapchelating Sepharose; 1 mL) that has been charged with nickel ionsaccording to the manufacturer's recommendations. After loading, thecolumn is washed with 50 column volumes of buffer A and the recombinanttarget protein is eluted with 5 mL of buffer B (50 mM Tris-HCl (pH 8.0),0.5 M NaCl, 500 mM imidazole).

[0214] The elution profile is monitored by measuring the absorbance ofthe fractions at 280 nm. Fractions corresponding to the protein peak arepooled, dialyzed against PBS containing 0.5 M arginine, filtered througha 0.22 μm membrane, and stored at −45° C.

[0215] 3.E.2. Insoluble Fraction

[0216] If the target protein is expressed in the insoluble fraction(pellets obtained from 3.E.), purification is conducted under denaturingconditions. NaCl, imidazole, and urea are added to the resuspendedpellet to final concentrations of 50 mM Tris-HCl (pH 8.0), 0.5 M NaCl,10 mM imidazole, and 6 M urea (buffer C). After complete solubilization,the mixture is filtered through a 0.45 μm membrane and loaded onto anIMAC column.

[0217] The purification procedures on the IMAC column are the same asdescribed in 3.E.1., except that 6 M urea is included in all buffersused and 10 column volumes of buffer C are used to wash the column afterprotein loading, instead of 50 column volumes.

[0218] The protein fractions eluted from the IMAC column with buffer D(buffer C containing 500 mM imidazole) are pooled. Arginine is added tothe solution to final concentration of 0.5 M and the mixture is dialyzedagainst PBS containing 0.5 M arginine and various concentrations of urea(4 M, 3 M, 2 M, 1 M, and 0.5 M) to progressively decrease theconcentration of urea. The final dialysate is filtered through a 0.22 μmmembrane and stored at −45° C.

[0219] Alternatively, when the above purification process is not asefficient as it should be, two other processes may be used as follows. Afirst alternative involves the use of a mild denaturant, N-octylglucoside (NOG). Briefly, a pellet obtained in 3.E. is homogenized in 5mM imidazole, 500 mM sodium chloride, 20 mM Tris-HCl (pH 7.9) bymicrofluidization at a pressure of 15,000 psi and is clarified bycentrifugation at 4,000-5,000× g. The pellet is recovered, resuspendedin 50 mM NaPO₄ (pH 7.5) containing 1-2% weight/volume NOG, andhomogenized. The NOG-soluble impurities are removed by centrifugation.The pellet is extracted once more by repeating the preceding extractionstep. The pellet is dissolved in 8 M urea, 50 mM Tris (pH 8.0). Theurea-solubilized protein is diluted with an equal volume of 2 Marginine, 50 mM Tris (pH 8.0), and is dialyzed against 1 M arginine for24-48 hours to remove the urea. The final dialysate is filtered througha 0.22 μm membrane and stored at −45° C.

[0220] A second alternative involves the use of a strong denaturant,such as guanidine hydrochloride. Briefly, a pellet obtained in 3.E. ishomogenized in 5 mM imidazole, 500 mM sodium chloride, 20 mM Tris-HCl(pH 7.9) by microfluidization at a pressure of 15,000 psi and clarifiedby centrifugation at 4,000-5,000× g. The pellet is recovered,resuspended in 6 M guanidine hydrochloride, and passed through an IMACcolumn charged with Ni++. The bound antigen is eluted with 8 M urea (pH8.5). Beta-mercaptoethanol is added to the eluted protein to a finalconcentration of 1 mM, then the eluted protein is passed through aSephadex G-25 column equilibrated in 0.1 M acetic acid. Protein elutedfrom the column is slowly added to 4 volumes of 50 mM phosphate buffer(pH 7.0). The protein remains in solution.

[0221] 3.F. Evaluation of the Protective Activity of the PurifiedProtein

[0222] A protection test is described above that was carried out fortesting the protective activity of the purified, native proteins. Thistest can also be used for testing the protective efficacy of recombinantproteins. Alternatively, the following test can be used.

[0223] Groups of 10 OF1 mice (IFFA Credo) are immunized rectally with 25μg of the purified recombinant protein, admixed with 1 μg of choleratoxin (Bema) in physiological buffer. Mice are immunized on days 0, 7,14, and 21. Fourteen days after the last immunization, the mice arechallenged with H. pylori strain ORV2001 grown in liquid media (thecells are grown on agar plates, as described in 1.A., and, afterharvest, the cells are resuspended in Brucella broth; the flasks arethen incubated overnight at 37° C.). Fourteen days after challenge, themice are sacrificed and their stomachs are removed. The amount of H.pylori is determined by measuring the urease activity in the stomach andby culture.

[0224] 3.G. Production of Monospecific Polyclonal Antibodies

[0225] 3.G.1. Hyperimmune Rabbit Antiserum

[0226] New Zealand rabbits are injected both subcutaneously andintramuscularly with 100 μg of a purified fusion polypeptide, asobtained in 3.E.1. or 3.E.2., in the presence of Freund's completeadjuvant and in a total volume of approximately 2 mL. Twenty one and 42days after the initial injection, booster doses, which are identical topriming doses, except that Freund's incomplete adjuvant is used, areadministered in the same way. Fifteen days after the last injection,animal serum is recovered, decomplemented, and filtered through a 0.45μm membrane.

[0227] 3.G.2. Mouse Hyperimmune Ascites Fluid

[0228] Ten mice are injected subcutaneously with 10-50 μg of a purifiedfusion polypeptide as obtained in 3.E.1. or 3.E.2., in the presence ofFreund's complete adjuvant and in a volume of approximately 200 μL.Seven and 14 days after the initial injection, booster doses, which areidentical to the priming doses, except that Freund's incomplete adjuvantis used, are administered in the same way. Twenty one and 28 days afterthe initial infection, mice receive 50 μg of the antigen aloneintraperitoneally. On day 21, mice are also injected intraperitoneallywith sarcoma 180/TG cells CM26684 (Lennette et al., DiagnosticProcedures for Viral, Rickettsial, and Chlamydial Infections, 5th Ed.Washington D.C., American Public Health Association, 1979). Ascitesfluid is collected 10-13 days after the last injection.

EXAMPLE 4

[0229] Methods for Producing Transcriptional Fusions Lacking His-tags

[0230] Methods for amplification and cloning of DNA encoding thepolypeptides of the invention as transcriptional fusions lackingHis-tags are described as follows. Two PCR primers for each clone aredesigned based upon the sequences of the polynucleotides that encodethem (SEQ ID NOs:1-21 (odd numbers)). These primers can be used toamplify DNA encoding the polypeptides of the invention from anyHelicobacter pylori strain, including, for example, ORV2001 and thestrain deposited with the American Type Culture Collection (ATCC,Rockville, Md.) as ATCC number 43579, as well as from other Helicobacterspecies.

[0231] The N-terminal primer is designed to include the ribosome bindingsite of the target gene, the ATG start site, the leader sequence, andthe cleavage site. The N-terminal primers can include a 5′ clamp andrestriction endonuclease recognition site, such as that for BamHI(GGATCC), which facilitates subsequent cloning. Similarly, theC-terminal primers can include a restriction endonuclease recognitionsite, such as that for XhoI (CTCGAG), which can be used in subsequentcloning, and a TAA stop codon. Specific primers that can be used arelisted above.

[0232] Amplification of a genes encoding the polypeptides of theinvention is carried out using Vent DNA polymerase (New England Biolabs)or Taq DNA polymerase (Appligene) under the conditions described abovein Example 3. Alternatively, Thermalase DNA polymerase or Pwo DNApolymerase (Boehringer Mannheim) can be used, according to instructionsprovided by the manufacturers.

[0233] A single PCR product for each clone is amplified and can becloned into BamHI-XhoI cleaved pET 24, resulting in construction oftranscriptional fusions that permit expression of the proteins withoutHis-tags. The expressed products can be purified as denatured proteinsthat are refolded by dialysis into 1 M arginine.

[0234] Cloning into pET 24 allows transcription of genes from the T7promoter, which is supplied by the vector, but relies upon binding ofthe RNA-specific DNA polymerase to the intrinsic ribosome binding siteof the genes, and thereby expression of the complete ORF. Theamplification, digestion, and cloning protocols are as described abovefor constructing translational fusions.

EXAMPLE 5

[0235] Purification of the Polypeptides of the Invention byImmunoaffinity

[0236] 5.A. Purification of Specific IgGs

[0237] An immune serum, as prepared in section 3.G., is applied to aprotein A Sepharose Fast Flow column (Pharmacia) equilibrated in 100 mMTris-HCl (pH 8.0). The resin is washed by applying 10 column volumes of100 mM Tris-HCl and 10 volumes of 10 mM Tris-HCl (pH 8.0) to the column.IgG antibodies are eluted with 0.1 M glycine buffer (pH 3.0) and arecollected as 5 mL fractions to which is added 0.25 mL 1 M Tris-HCl (pH8.0). The optical density of the eluate is measured at 280 nm and thefractions containing the IgG antibodies are pooled, dialyzed against 50mM Tris-HCl (pH 8.0), and, if necessary, stored frozen at −70° C.

[0238] 5.B. Preparation of the Column

[0239] An appropriate amount of CNBr-activated Sepharose 4B gel (1 g ofdried gel provides for approximately 3.5 mL of hydrated gel; gelcapacity is from 5 to 10 mg coupled IgG/mL of gel) manufactured byPharmacia (17-0430-01) is suspended in 1 mM HCl buffer and washed with abuchner by adding small quantities of 1 mM HCl buffer. The total volumeof buffer is 200 mL per gram of gel.

[0240] Purified IgG antibodies are dialyzed for 4 hours at 20+5° C.against 50 volumes of 500 mM sodium phosphate buffer (pH 7.5). Theantibodies are then diluted in 500 mM phosphate buffer (pH 7.5) to afinal concentration of 3 mg/mL.

[0241] IgG antibodies are mixed with the gel overnight at 5±3° C. Thegel is packed into a chromatography column and is washed with 2 columnvolumes of 500 mM phosphate buffer (pH 7.5), and 1 column volume of 50mM sodium phosphate buffer, containing 500 mM NaCl (pH 7.5). The gel isthen transferred to a tube, mixed with 100 mM ethanolamine (pH 7.5) for4 hours at room temperature, and washed twice with 2 column volumes ofPBS. The gel is then stored in 1/10,000 PBS/merthiolate. The amount ofIgG antibodies coupled to the gel is determined by measuring the opticaldensity (OD) at 280 nm of the IgG solution and the direct eluate, pluswashings.

[0242] 5.C. Adsorption and Elution of the Antigen

[0243] An antigen solution in 50 mM Tris-HCl (pH 8.0), 2 mM EDTA, forexample, the supernatant obtained in 3.E.1. or the solubilized pelletobtained in 3.E.2., after centrifugation and filtration through a 0.45μm membrane, is applied to a column equilibrated with 50 mM Tris-HCl (pH8.0), 2 mM EDTA, at a flow rate of about 10 mL/hour. The column is thenwashed with 20 volumes of 50 mM Tris-HCl (pH 8.0), 2 mM EDTA.Alternatively, adsorption can be achieved by mixing overnight at 5±3° C.

[0244] The adsorbed gel is washed with 2 to 6 volumes of 10 mM sodiumphosphate buffer (pH 6.8) and the antigen is eluted with 100 mM glycinebuffer (pH 2.5). The eluate is recovered in 3 mL fractions, to each ofwhich is added 150 μL of 1 M sodium phosphate buffer (pH 8.0).Absorption is measured at 280 nm for each fraction; those fractionscontaining the antigen are pooled and stored at −20° C.

[0245] Other embodiments are within the following claims.

1 64 2798 base pairs nucleic acid single linear Genomic DNA CodingSequence 328...2451 (A) NAME/KEY Signal Sequence (B) LOCATION 328...385(D) OTHER INFORMATION 1 TGGTCCTGGC ATTCCGAGGT TCGAATCCTT GCACCCCAGCCATTTTTCCT TATTTTTTGG 60 CGCGGAGTAG AGCAGTCCGG TAGCTCGTTG GGCTCATAACCCAAAGGTCA GTGGTTCAAA 120 TCCATTCTCC GCAACCAATC CTTTAAACCA CACCACCACCAAACGAACCA AACGAAACAA 180 AAAGCATCAA AATCAAAAAA ATGACAAAAT TTTTAAGAAAATGACAAAAA AAAAAAAAAC 240 GATTTTATGC TATATTAACG AAATCTTGTG ATAAGATCTTATTCTTTTAA AAGACTTATC 300 TAACCATTTT AATTTCAAGG AGAAAAC ATG AAA AAA ACCCTT TTA CTC TCT CTC 354 Met Lys Lys Thr Leu Leu Leu Ser Leu -15 TCT CTCTCT CTC TCG TTT TTG CTC CAC GCT GAA GAC GAC GGC TTT TAC 402 Ser Leu SerLeu Ser Phe Leu Leu His Ala Glu Asp Asp Gly Phe Tyr -10 -5 1 5 ACA AGCGTG GGC TAT CAA ATC GGT GAA GCC GCT CAA ATG GTG AAA AAC 450 Thr Ser ValGly Tyr Gln Ile Gly Glu Ala Ala Gln Met Val Lys Asn 10 15 20 ACC AAA GGCATT CAA GAG CTT TCA GAC AAT TAT GAA AAG CTG AAC AAT 498 Thr Lys Gly IleGln Glu Leu Ser Asp Asn Tyr Glu Lys Leu Asn Asn 25 30 35 CTT TTG AAT AATTAC AGC ACC CTA AAC ACC CTT ATC AAA TTG TCC GCT 546 Leu Leu Asn Asn TyrSer Thr Leu Asn Thr Leu Ile Lys Leu Ser Ala 40 45 50 GAT CCG AGC GCG ATTAAC GAC GCA AGG GAT AAT CTA GGC TCA AGC TCT 594 Asp Pro Ser Ala Ile AsnAsp Ala Arg Asp Asn Leu Gly Ser Ser Ser 55 60 65 70 AGG AAT TTG CTT GATGTC AAA ACC AAT TCC CCC GCG TAT CAA GCC GTG 642 Arg Asn Leu Leu Asp ValLys Thr Asn Ser Pro Ala Tyr Gln Ala Val 75 80 85 CTT TTA GCA CTC AAT GCTGCA GTG GGG TTG TGG CAA GTT ACA AGC TAC 690 Leu Leu Ala Leu Asn Ala AlaVal Gly Leu Trp Gln Val Thr Ser Tyr 90 95 100 GCT TTT ACT GCT TGT GGTCCT GGC AGT AAC GAG AAT GCG AAT GGA GGG 738 Ala Phe Thr Ala Cys Gly ProGly Ser Asn Glu Asn Ala Asn Gly Gly 105 110 115 ATC CAA ACT TTT AAT AATGTG CCA GGA CAA GAT ACG ACG ACC ATC ACT 786 Ile Gln Thr Phe Asn Asn ValPro Gly Gln Asp Thr Thr Thr Ile Thr 120 125 130 TGC AAT TCG TAT TAT GAGCCA GGA CAT GGT GGG CCT ATA TCC ACT GCA 834 Cys Asn Ser Tyr Tyr Glu ProGly His Gly Gly Pro Ile Ser Thr Ala 135 140 145 150 AAT TAT GCG AAA ATCAAT CAA GCC TAT CAA ATC ATC CAA AAG GCT TTG 882 Asn Tyr Ala Lys Ile AsnGln Ala Tyr Gln Ile Ile Gln Lys Ala Leu 155 160 165 ACA GCC AAT GGA GCTAAT GGA GAT GGG GTC CCC GTT TTA AGC AAC ACC 930 Thr Ala Asn Gly Ala AsnGly Asp Gly Val Pro Val Leu Ser Asn Thr 170 175 180 ACT ACA AAA CTT GATTTC ACT ATC AAT GGA GAC AAA AGA ACG GGG GGC 978 Thr Thr Lys Leu Asp PheThr Ile Asn Gly Asp Lys Arg Thr Gly Gly 185 190 195 AAA CCA AAT ACA CCTGAA AAG TTC CCA TGG AGT GAT GGG AAA TAT ATT 1026 Lys Pro Asn Thr Pro GluLys Phe Pro Trp Ser Asp Gly Lys Tyr Ile 200 205 210 CAC ACC CAA TGG ATTAAC ACA ATA GTA ACA CCA ACA GAA ACA AAT ATC 1074 His Thr Gln Trp Ile AsnThr Ile Val Thr Pro Thr Glu Thr Asn Ile 215 220 225 230 AAC ACA GAA AATAAC GCT CAA GAG CTT TTA AAA CAA GCG AGC ATC ATT 1122 Asn Thr Glu Asn AsnAla Gln Glu Leu Leu Lys Gln Ala Ser Ile Ile 235 240 245 ATC ACT ACC CTAAAT GAG GCA TGC CCA AAC TTC CAA AAT GGT GGT AGA 1170 Ile Thr Thr Leu AsnGlu Ala Cys Pro Asn Phe Gln Asn Gly Gly Arg 250 255 260 AGT TAT TGG CAAGGG ATA AGC GGC AAT GGG ACA ATG TGC GGG ATG TTT 1218 Ser Tyr Trp Gln GlyIle Ser Gly Asn Gly Thr Met Cys Gly Met Phe 265 270 275 AAG AAT GAA ATCAGC GCG ATC CAA GGC ATG ATC GCT AAC GCT CAA GAA 1266 Lys Asn Glu Ile SerAla Ile Gln Gly Met Ile Ala Asn Ala Gln Glu 280 285 290 GCT GTC GCG CAAAGC AAA ATC GTT AGT GAA AAC GCG CAA AAT CAA AAC 1314 Ala Val Ala Gln SerLys Ile Val Ser Glu Asn Ala Gln Asn Gln Asn 295 300 305 310 AAC TTG GATACT GGA AAA CCA TTC AAC CCT TAC ACG GAC GCC AGC TTT 1362 Asn Leu Asp ThrGly Lys Pro Phe Asn Pro Tyr Thr Asp Ala Ser Phe 315 320 325 GCG CAA AGCATG CTC AAA AAC GCT CAA GCG CAA GCA GAG ATT TTA AAC 1410 Ala Gln Ser MetLeu Lys Asn Ala Gln Ala Gln Ala Glu Ile Leu Asn 330 335 340 CAA GCC GAACAA GTA GTA AAA AAC TTT GAA AAA ATC CCT ACA GCC TTT 1458 Gln Ala Glu GlnVal Val Lys Asn Phe Glu Lys Ile Pro Thr Ala Phe 345 350 355 GTA TCA GACTCT TTA GGG GTG TGT TAT GAA GTG CAA GGG GGT GAG CGT 1506 Val Ser Asp SerLeu Gly Val Cys Tyr Glu Val Gln Gly Gly Glu Arg 360 365 370 AGG GGC ACCAAT CCA GGT CAG GTA ACT TCT AAC ACT TGG GGA GCC GGT 1554 Arg Gly Thr AsnPro Gly Gln Val Thr Ser Asn Thr Trp Gly Ala Gly 375 380 385 390 TGC GCGTAT GTG AAA CAA ACC ATA ACG AAT TTA GAC AAC AGC ATC GCT 1602 Cys Ala TyrVal Lys Gln Thr Ile Thr Asn Leu Asp Asn Ser Ile Ala 395 400 405 CAC TTTGGC ACT CAA GAG CAG CAG ATA CAG CAA GCC GAA AAC ATC GCT 1650 His Phe GlyThr Gln Glu Gln Gln Ile Gln Gln Ala Glu Asn Ile Ala 410 415 420 GAC ACTCTA GTG AAT TTC AAA TCT AGA TAC AGC GAA TTA GGC AAC ACC 1698 Asp Thr LeuVal Asn Phe Lys Ser Arg Tyr Ser Glu Leu Gly Asn Thr 425 430 435 TAT AACAGC ATC ACC ACC GCG CTC TCC AAA GTC CCT AAC GCG CAA AGC 1746 Tyr Asn SerIle Thr Thr Ala Leu Ser Lys Val Pro Asn Ala Gln Ser 440 445 450 TTG CAAAAC GTG GTG AGC AAA AAG AAT AAC CCC TAT AGC CCT CAA GGC 1794 Leu Gln AsnVal Val Ser Lys Lys Asn Asn Pro Tyr Ser Pro Gln Gly 455 460 465 470 ATAGAG ACC AAT TAC TAC CTC AAT CAA AAT TCT TAC AAC CAA ATC CAA 1842 Ile GluThr Asn Tyr Tyr Leu Asn Gln Asn Ser Tyr Asn Gln Ile Gln 475 480 485 ACCATC AAC CAA GAA CTA GGG CGT AAC CCC TTT AGG AAA GTG GGC ATC 1890 Thr IleAsn Gln Glu Leu Gly Arg Asn Pro Phe Arg Lys Val Gly Ile 490 495 500 GTCAAT TCT CAA ACC AAC AAT GGT GCC ATG AAT GGG ATC GGT ATT CAG 1938 Val AsnSer Gln Thr Asn Asn Gly Ala Met Asn Gly Ile Gly Ile Gln 505 510 515 GTGGGC TAT AAG CAA TTC TTT GGC CAA AAA AGA AAA TGG GGC GCT AGG 1986 Val GlyTyr Lys Gln Phe Phe Gly Gln Lys Arg Lys Trp Gly Ala Arg 520 525 530 TATTAC GGC TTT TTT GAC TAC AAC CAT GCG TTC ATT AAA TCC AGC TTC 2034 Tyr TyrGly Phe Phe Asp Tyr Asn His Ala Phe Ile Lys Ser Ser Phe 535 540 545 550TTC AAC TCG GCT TCT GAT GTG TGG ACT TAT GGT TTT GGA GCG GAC GCT 2082 PheAsn Ser Ala Ser Asp Val Trp Thr Tyr Gly Phe Gly Ala Asp Ala 555 560 565CTT TAT AAC TTC ATC AAC GAT AAA GCC ACC AAT TTC TTA GGC AAA AAC 2130 LeuTyr Asn Phe Ile Asn Asp Lys Ala Thr Asn Phe Leu Gly Lys Asn 570 575 580AAC AAG CTT TCC GTG GGG CTT TTT GGA GGG ATT GCG TTA GCG GGC ACT 2178 AsnLys Leu Ser Val Gly Leu Phe Gly Gly Ile Ala Leu Ala Gly Thr 585 590 595TCA TGG CTT AAT TCT GAG TAT GTG AAT TTA GCC ACC GTG AAT AAC GTC 2226 SerTrp Leu Asn Ser Glu Tyr Val Asn Leu Ala Thr Val Asn Asn Val 600 605 610TAT AAC GCT AAA ATG AAT GTG GCG AAT TTC CAA TTC TTA TTC AAT ATG 2274 TyrAsn Ala Lys Met Asn Val Ala Asn Phe Gln Phe Leu Phe Asn Met 615 620 625630 GGA GTG AGG ATG AAT TTA GCC AGA TCC AAG AAA AAA GGC AGC GAT CAT 2322Gly Val Arg Met Asn Leu Ala Arg Ser Lys Lys Lys Gly Ser Asp His 635 640645 GCG GCT CAG CAT GGG ATT GAA CTA GGG CTT AAA ATC CCC ACC ATC AAC 2370Ala Ala Gln His Gly Ile Glu Leu Gly Leu Lys Ile Pro Thr Ile Asn 650 655660 ACG AAC TAT TAT TCT TTC ATG GGG GCT GAA CTC AAA TAC AGA AGG CTT 2418Thr Asn Tyr Tyr Ser Phe Met Gly Ala Glu Leu Lys Tyr Arg Arg Leu 665 670675 TAT AGC GTG TAT TTG AAT TAT GTG TTC GCT TAC TAAGCTTTTT GTGAAACTCC2471 Tyr Ser Val Tyr Leu Asn Tyr Val Phe Ala Tyr 680 685 CTTTTTAAGGGGTTTTTTTT TGAACTCTCT TTTTAAATTC TCTTTTTAAA GAGATTTCTT 2531 TTTTTTAAGCTTTTTTTTGA ATTCTTTTTT TTGAATTCTT TGTTTTTAAG CTTTTTTTAA 2591 ACCCTTTCGTTTTTAAACTC CCTTTTTTAA GGGATTTCTT TTTTTAAACT CTTTTTTTTT 2651 AAACTCTTTTTTTTAAACCC TCTTTTTTTA AGGGATTTCT TTTTAAAGCT TTTTTGAAGT 2711 CTTTTTTTAAATTCTTTTTT TGGGGGTTTG ATCTTTCTTT TTGCCAATCC CCACTACTTT 2771 CGCTTTTTAATCTTTAGGTT TTATTTT 2798 708 amino acids amino acid single linear proteininternal Signal Sequence 1...19 2 Met Lys Lys Thr Leu Leu Leu Ser LeuSer Leu Ser Leu Ser Phe Leu -15 -10 -5 Leu His Ala Glu Asp Asp Gly PheTyr Thr Ser Val Gly Tyr Gln Ile 1 5 10 Gly Glu Ala Ala Gln Met Val LysAsn Thr Lys Gly Ile Gln Glu Leu 15 20 25 Ser Asp Asn Tyr Glu Lys Leu AsnAsn Leu Leu Asn Asn Tyr Ser Thr 30 35 40 45 Leu Asn Thr Leu Ile Lys LeuSer Ala Asp Pro Ser Ala Ile Asn Asp 50 55 60 Ala Arg Asp Asn Leu Gly SerSer Ser Arg Asn Leu Leu Asp Val Lys 65 70 75 Thr Asn Ser Pro Ala Tyr GlnAla Val Leu Leu Ala Leu Asn Ala Ala 80 85 90 Val Gly Leu Trp Gln Val ThrSer Tyr Ala Phe Thr Ala Cys Gly Pro 95 100 105 Gly Ser Asn Glu Asn AlaAsn Gly Gly Ile Gln Thr Phe Asn Asn Val 110 115 120 125 Pro Gly Gln AspThr Thr Thr Ile Thr Cys Asn Ser Tyr Tyr Glu Pro 130 135 140 Gly His GlyGly Pro Ile Ser Thr Ala Asn Tyr Ala Lys Ile Asn Gln 145 150 155 Ala TyrGln Ile Ile Gln Lys Ala Leu Thr Ala Asn Gly Ala Asn Gly 160 165 170 AspGly Val Pro Val Leu Ser Asn Thr Thr Thr Lys Leu Asp Phe Thr 175 180 185Ile Asn Gly Asp Lys Arg Thr Gly Gly Lys Pro Asn Thr Pro Glu Lys 190 195200 205 Phe Pro Trp Ser Asp Gly Lys Tyr Ile His Thr Gln Trp Ile Asn Thr210 215 220 Ile Val Thr Pro Thr Glu Thr Asn Ile Asn Thr Glu Asn Asn AlaGln 225 230 235 Glu Leu Leu Lys Gln Ala Ser Ile Ile Ile Thr Thr Leu AsnGlu Ala 240 245 250 Cys Pro Asn Phe Gln Asn Gly Gly Arg Ser Tyr Trp GlnGly Ile Ser 255 260 265 Gly Asn Gly Thr Met Cys Gly Met Phe Lys Asn GluIle Ser Ala Ile 270 275 280 285 Gln Gly Met Ile Ala Asn Ala Gln Glu AlaVal Ala Gln Ser Lys Ile 290 295 300 Val Ser Glu Asn Ala Gln Asn Gln AsnAsn Leu Asp Thr Gly Lys Pro 305 310 315 Phe Asn Pro Tyr Thr Asp Ala SerPhe Ala Gln Ser Met Leu Lys Asn 320 325 330 Ala Gln Ala Gln Ala Glu IleLeu Asn Gln Ala Glu Gln Val Val Lys 335 340 345 Asn Phe Glu Lys Ile ProThr Ala Phe Val Ser Asp Ser Leu Gly Val 350 355 360 365 Cys Tyr Glu ValGln Gly Gly Glu Arg Arg Gly Thr Asn Pro Gly Gln 370 375 380 Val Thr SerAsn Thr Trp Gly Ala Gly Cys Ala Tyr Val Lys Gln Thr 385 390 395 Ile ThrAsn Leu Asp Asn Ser Ile Ala His Phe Gly Thr Gln Glu Gln 400 405 410 GlnIle Gln Gln Ala Glu Asn Ile Ala Asp Thr Leu Val Asn Phe Lys 415 420 425Ser Arg Tyr Ser Glu Leu Gly Asn Thr Tyr Asn Ser Ile Thr Thr Ala 430 435440 445 Leu Ser Lys Val Pro Asn Ala Gln Ser Leu Gln Asn Val Val Ser Lys450 455 460 Lys Asn Asn Pro Tyr Ser Pro Gln Gly Ile Glu Thr Asn Tyr TyrLeu 465 470 475 Asn Gln Asn Ser Tyr Asn Gln Ile Gln Thr Ile Asn Gln GluLeu Gly 480 485 490 Arg Asn Pro Phe Arg Lys Val Gly Ile Val Asn Ser GlnThr Asn Asn 495 500 505 Gly Ala Met Asn Gly Ile Gly Ile Gln Val Gly TyrLys Gln Phe Phe 510 515 520 525 Gly Gln Lys Arg Lys Trp Gly Ala Arg TyrTyr Gly Phe Phe Asp Tyr 530 535 540 Asn His Ala Phe Ile Lys Ser Ser PhePhe Asn Ser Ala Ser Asp Val 545 550 555 Trp Thr Tyr Gly Phe Gly Ala AspAla Leu Tyr Asn Phe Ile Asn Asp 560 565 570 Lys Ala Thr Asn Phe Leu GlyLys Asn Asn Lys Leu Ser Val Gly Leu 575 580 585 Phe Gly Gly Ile Ala LeuAla Gly Thr Ser Trp Leu Asn Ser Glu Tyr 590 595 600 605 Val Asn Leu AlaThr Val Asn Asn Val Tyr Asn Ala Lys Met Asn Val 610 615 620 Ala Asn PheGln Phe Leu Phe Asn Met Gly Val Arg Met Asn Leu Ala 625 630 635 Arg SerLys Lys Lys Gly Ser Asp His Ala Ala Gln His Gly Ile Glu 640 645 650 LeuGly Leu Lys Ile Pro Thr Ile Asn Thr Asn Tyr Tyr Ser Phe Met 655 660 665Gly Ala Glu Leu Lys Tyr Arg Arg Leu Tyr Ser Val Tyr Leu Asn Tyr 670 675680 685 Val Phe Ala Tyr 2699 base pairs nucleic acid single linearGenomic DNA Coding Sequence 199...2397 (A) NAME/KEY Signal Sequence (B)LOCATION 199...259 (D) OTHER INFORMATION 3 TAAAATCCAA TTAAAAGCGTTCAAAGGTAA CGCAAAAAAA CAAAAAATGA CGCAATTTTT 60 TCAAAATGAC AAAAAAAAACGCTTTATGCT ATAATACCCC AAATACATTC TAATAGCAAA 120 TGCGTTCTAA TGCAAATGCATTCCAATGTA TGAAATCCCT AATACTAAAT CCAATTTAAT 180 CCAAAAAGGA GAAAAAAC ATGAAA AAA CAC ATC CTT TCA TTA GCT TTA GGC 231 Met Lys Lys His Ile Leu SerLeu Ala Leu Gly -20 -15 -10 TCG CTT TTA GTT TCC ACT TTG AGC GCT GAA GACGAC GGC TTT TAC ACA 279 Ser Leu Leu Val Ser Thr Leu Ser Ala Glu Asp AspGly Phe Tyr Thr -5 1 5 AGC GTA GGC TAT CAG ATC GGT GAA GCC GCT CAA ATGGTA ACA AAC ACC 327 Ser Val Gly Tyr Gln Ile Gly Glu Ala Ala Gln Met ValThr Asn Thr 10 15 20 AAA GGC ATC CAA CAG CTT TCA GAC AAT TAT GAA AAT TTGAAC AAC CTT 375 Lys Gly Ile Gln Gln Leu Ser Asp Asn Tyr Glu Asn Leu AsnAsn Leu 25 30 35 TTA ACG AGA TAC AGC ACC CTA AAC ACC CTT ATC AAA TTG TCCGCT GAT 423 Leu Thr Arg Tyr Ser Thr Leu Asn Thr Leu Ile Lys Leu Ser AlaAsp 40 45 50 55 CCG AGC GCA ATT AAT GCG GTG CGG GAA AAT CTG GGC GCG AGCGCG AAG 471 Pro Ser Ala Ile Asn Ala Val Arg Glu Asn Leu Gly Ala Ser AlaLys 60 65 70 AAT TTG ATC GGC GAT AAA GCC AAC TCC CCC GCC TAT CAA GCC GTGCTT 519 Asn Leu Ile Gly Asp Lys Ala Asn Ser Pro Ala Tyr Gln Ala Val Leu75 80 85 TTA GCG ATC AAC GCG GCG GTA GGG TTT TGG AAT GTC GTG GGC TAT GTG567 Leu Ala Ile Asn Ala Ala Val Gly Phe Trp Asn Val Val Gly Tyr Val 9095 100 ACG CAA TGT GGG GGT AAC GCC AAT GGT CAA GAA AGC ACC TCT TCA ACC615 Thr Gln Cys Gly Gly Asn Ala Asn Gly Gln Glu Ser Thr Ser Ser Thr 105110 115 ACC ATC TTC AAC AAC GAG CCA GGG TAT CGA TCC ACT TCC ATC ACT TGT663 Thr Ile Phe Asn Asn Glu Pro Gly Tyr Arg Ser Thr Ser Ile Thr Cys 120125 130 135 TCT TTG AAC GGG CAT AAG CCT GGA TAC TAT GGC CCT ATG AGC ATTGAG 711 Ser Leu Asn Gly His Lys Pro Gly Tyr Tyr Gly Pro Met Ser Ile Glu140 145 150 AAT TTT AAA AAG CTT AAC GAA GCC TAT CAG ATC CTC CAA ACG GCTTTA 759 Asn Phe Lys Lys Leu Asn Glu Ala Tyr Gln Ile Leu Gln Thr Ala Leu155 160 165 AAA AAC GGC TTA CCC GCG CTC AAA GAA AAC AAC GGG AAG GTC AGTGTA 807 Lys Asn Gly Leu Pro Ala Leu Lys Glu Asn Asn Gly Lys Val Ser Val170 175 180 ACC TAT ACC TAC ACA TGC TCA GGG CAA GGG AAT AAT AAC TGC TCGCCA 855 Thr Tyr Thr Tyr Thr Cys Ser Gly Gln Gly Asn Asn Asn Cys Ser Pro185 190 195 AGT GTC AAC GGA ACC AAA ACC ACA ACC CAA ACC ATA GAC GGC AAAAGC 903 Ser Val Asn Gly Thr Lys Thr Thr Thr Gln Thr Ile Asp Gly Lys Ser200 205 210 215 GTA ACC ACC ACG ATC AGT TCA AAA GTG GTT GGT AGC ATC GCTAGT GGC 951 Val Thr Thr Thr Ile Ser Ser Lys Val Val Gly Ser Ile Ala SerGly 220 225 230 AAC ACA TCA CAT GTC ATC ACC AAC AAA TTA GAC GGT GTG CCTGAT AGC 999 Asn Thr Ser His Val Ile Thr Asn Lys Leu Asp Gly Val Pro AspSer 235 240 245 GCT CAA GCG CTC TTA GCG CAA GCG AGC ACG CTC ATC AAC ACCATC AAC 1047 Ala Gln Ala Leu Leu Ala Gln Ala Ser Thr Leu Ile Asn Thr IleAsn 250 255 260 GAA GCA TGC CCG TAT TTC CAT GCT ACT AAT AGT AGT GAG GCTAAC GCC 1095 Glu Ala Cys Pro Tyr Phe His Ala Thr Asn Ser Ser Glu Ala AsnAla 265 270 275 CCA AAA TTC TCT ACT ACT ACT GGG AAA ATA TGC GGC GCT TTTTCA GAA 1143 Pro Lys Phe Ser Thr Thr Thr Gly Lys Ile Cys Gly Ala Phe SerGlu 280 285 290 295 GAA ATC AGC GCG ATC CAA AAG ATG ATC ACG GAC GCG CAAGAG CTA GTT 1191 Glu Ile Ser Ala Ile Gln Lys Met Ile Thr Asp Ala Gln GluLeu Val 300 305 310 AAT CAA ACG AGC GTC ATT AAC AGC AAC GAA CAA TCA ACTCCG GTA GGC 1239 Asn Gln Thr Ser Val Ile Asn Ser Asn Glu Gln Ser Thr ProVal Gly 315 320 325 AAT AAT AAT GGC AAG CCT TTC AAC CCT TTC ACG GAC GCAAGT TTT GCG 1287 Asn Asn Asn Gly Lys Pro Phe Asn Pro Phe Thr Asp Ala SerPhe Ala 330 335 340 CAA GGC ATG CTC GCT AAC GCT AGC GCG CAA GCT AAA ATGCTC AAT TTA 1335 Gln Gly Met Leu Ala Asn Ala Ser Ala Gln Ala Lys Met LeuAsn Leu 345 350 355 GCC CAT CAG GTG GGG CAA GCC ATT AAC CCA GAG AAT CTTAGC GAG AAT 1383 Ala His Gln Val Gly Gln Ala Ile Asn Pro Glu Asn Leu SerGlu Asn 360 365 370 375 TTT AAA AAT TTT GTT ACA GGC TTT TTA GCC ACA TGCAAT AAC AAA TCA 1431 Phe Lys Asn Phe Val Thr Gly Phe Leu Ala Thr Cys AsnAsn Lys Ser 380 385 390 ACA GCT GGC ACT GGT GGC ACA CAA GGT TCA GCT CCAGGC ACA GTG ACC 1479 Thr Ala Gly Thr Gly Gly Thr Gln Gly Ser Ala Pro GlyThr Val Thr 395 400 405 ACT CAA ACT TTC GCT TCT GGT TGC GCG TAT GTG GAGCAA ACC CTA ACG 1527 Thr Gln Thr Phe Ala Ser Gly Cys Ala Tyr Val Glu GlnThr Leu Thr 410 415 420 AAC TTA GGC AAC AGC ATC GCT CAC TTT GGC ACT CAAGAG CAG CAG ATA 1575 Asn Leu Gly Asn Ser Ile Ala His Phe Gly Thr Gln GluGln Gln Ile 425 430 435 CAG CAA GCC GAA AAC ATC GCT GAC ACT CTA GTG AATTTC AAA TCT AGA 1623 Gln Gln Ala Glu Asn Ile Ala Asp Thr Leu Val Asn PheLys Ser Arg 440 445 450 455 TAC AGC GAA TTA GGC AAC ACC TAT AAC AGC ATCACC ACC GCG CTC TCC 1671 Tyr Ser Glu Leu Gly Asn Thr Tyr Asn Ser Ile ThrThr Ala Leu Ser 460 465 470 AAA GTC CCT AAC GCG CAA AGC TTG CAA AAC GTGGTG AGC AAA AAG AAT 1719 Lys Val Pro Asn Ala Gln Ser Leu Gln Asn Val ValSer Lys Lys Asn 475 480 485 AAC CCC TAT AGC CCT CAA GGC ATA GAG ACC AATTAC TAC CTC AAT CAA 1767 Asn Pro Tyr Ser Pro Gln Gly Ile Glu Thr Asn TyrTyr Leu Asn Gln 490 495 500 AAT TCT TAC AAC CAA ATC CAA ACC ATC AAC CAAGAA CTA GGG CGT AAC 1815 Asn Ser Tyr Asn Gln Ile Gln Thr Ile Asn Gln GluLeu Gly Arg Asn 505 510 515 CCC TTT AGG AAA GTG GGC ATC GTC AAT TCT CAAACC AAC AAT GGT GCC 1863 Pro Phe Arg Lys Val Gly Ile Val Asn Ser Gln ThrAsn Asn Gly Ala 520 525 530 535 ATG AAT GGG ATC GGT ATT CAG GTG GGC TATAAG CAA TTC TTT GGC CAA 1911 Met Asn Gly Ile Gly Ile Gln Val Gly Tyr LysGln Phe Phe Gly Gln 540 545 550 AAA AGA AAA TGG GGC GCT AGG TAT TAC GGCTTT TTT GAT TAC AAC CAT 1959 Lys Arg Lys Trp Gly Ala Arg Tyr Tyr Gly PhePhe Asp Tyr Asn His 555 560 565 GCG TTC ATC AAA TCC AGC TTT TTC AAC TCGGCT TCT GAC GTG TGG ACT 2007 Ala Phe Ile Lys Ser Ser Phe Phe Asn Ser AlaSer Asp Val Trp Thr 570 575 580 TAT GGT TTT GGA GCG GAC GCG CTT TAT AACTTC ATC AAC GAT AAA GCC 2055 Tyr Gly Phe Gly Ala Asp Ala Leu Tyr Asn PheIle Asn Asp Lys Ala 585 590 595 ACC AAT TTC TTA GGC AAA AAC AAC AAG CTTTCT TTG GGG CTT TTT GGC 2103 Thr Asn Phe Leu Gly Lys Asn Asn Lys Leu SerLeu Gly Leu Phe Gly 600 605 610 615 GGG ATT GCG TTA GCG GGC ACT TCA TGGCTC AAT TCT GAG TAC GTG AAT 2151 Gly Ile Ala Leu Ala Gly Thr Ser Trp LeuAsn Ser Glu Tyr Val Asn 620 625 630 TTA GCC ACC GTG AAT AAC GTC TAT AACGCT AAA ATG AAT GTG GCG AAT 2199 Leu Ala Thr Val Asn Asn Val Tyr Asn AlaLys Met Asn Val Ala Asn 635 640 645 TTC CAA TTC TTA TTC AAT ATG GGA GTGAGG ATG AAT TTA GCC AGA TCC 2247 Phe Gln Phe Leu Phe Asn Met Gly Val ArgMet Asn Leu Ala Arg Ser 650 655 660 AAG AAA AAA GGC AGC GAT CAT GCA GCTCAG CAT GGG ATT GAG TTA GGG 2295 Lys Lys Lys Gly Ser Asp His Ala Ala GlnHis Gly Ile Glu Leu Gly 665 670 675 CTT AAA ATC CCC ACC ATC AAC ACG AACTAT TAT TCC TTT ATG GGG GCT 2343 Leu Lys Ile Pro Thr Ile Asn Thr Asn TyrTyr Ser Phe Met Gly Ala 680 685 690 695 GAA CTC AAA TAC AGA AGG CTC TATAGC GTG TAT TTG AAC TAT GTG TTC 2391 Glu Leu Lys Tyr Arg Arg Leu Tyr SerVal Tyr Leu Asn Tyr Val Phe 700 705 710 GCT TAC TAATGTTTGG CTCTTTGTGAAACTCCCTTT TTAAGGGGTT TTTTTTTGAA CT 2449 Ala Tyr CTCTTTTTAA ATTCTCTTTTTAAAGAGATT TCTTTTTTTT AAGCTTTTTT TTGAATTCTT 2509 TTTTTTTGAA TTCTTTGTTTTTAAGCTTTT TTTAAACCCT TTCGTTTTTA AACTCCCTTT 2569 TTTAAGGGAT TTCTTTTTTTGAACTCCCTT TTTTGAACCC TTTTTTTTAA ACCCTCTTTT 2629 TTTAAGGGGT TTCTTTTTAAAGCTTTTTTG AAGTCTTTTT TTAAATTCTT TTTTTGGGGG 2689 TTTGATCTTT 2699 733amino acids amino acid single linear protein internal Signal Sequence1...20 4 Met Lys Lys His Ile Leu Ser Leu Ala Leu Gly Ser Leu Leu Val Ser-20 -15 -10 -5 Thr Leu Ser Ala Glu Asp Asp Gly Phe Tyr Thr Ser Val GlyTyr Gln 1 5 10 Ile Gly Glu Ala Ala Gln Met Val Thr Asn Thr Lys Gly IleGln Gln 15 20 25 Leu Ser Asp Asn Tyr Glu Asn Leu Asn Asn Leu Leu Thr ArgTyr Ser 30 35 40 Thr Leu Asn Thr Leu Ile Lys Leu Ser Ala Asp Pro Ser AlaIle Asn 45 50 55 60 Ala Val Arg Glu Asn Leu Gly Ala Ser Ala Lys Asn LeuIle Gly Asp 65 70 75 Lys Ala Asn Ser Pro Ala Tyr Gln Ala Val Leu Leu AlaIle Asn Ala 80 85 90 Ala Val Gly Phe Trp Asn Val Val Gly Tyr Val Thr GlnCys Gly Gly 95 100 105 Asn Ala Asn Gly Gln Glu Ser Thr Ser Ser Thr ThrIle Phe Asn Asn 110 115 120 Glu Pro Gly Tyr Arg Ser Thr Ser Ile Thr CysSer Leu Asn Gly His 125 130 135 140 Lys Pro Gly Tyr Tyr Gly Pro Met SerIle Glu Asn Phe Lys Lys Leu 145 150 155 Asn Glu Ala Tyr Gln Ile Leu GlnThr Ala Leu Lys Asn Gly Leu Pro 160 165 170 Ala Leu Lys Glu Asn Asn GlyLys Val Ser Val Thr Tyr Thr Tyr Thr 175 180 185 Cys Ser Gly Gln Gly AsnAsn Asn Cys Ser Pro Ser Val Asn Gly Thr 190 195 200 Lys Thr Thr Thr GlnThr Ile Asp Gly Lys Ser Val Thr Thr Thr Ile 205 210 215 220 Ser Ser LysVal Val Gly Ser Ile Ala Ser Gly Asn Thr Ser His Val 225 230 235 Ile ThrAsn Lys Leu Asp Gly Val Pro Asp Ser Ala Gln Ala Leu Leu 240 245 250 AlaGln Ala Ser Thr Leu Ile Asn Thr Ile Asn Glu Ala Cys Pro Tyr 255 260 265Phe His Ala Thr Asn Ser Ser Glu Ala Asn Ala Pro Lys Phe Ser Thr 270 275280 Thr Thr Gly Lys Ile Cys Gly Ala Phe Ser Glu Glu Ile Ser Ala Ile 285290 295 300 Gln Lys Met Ile Thr Asp Ala Gln Glu Leu Val Asn Gln Thr SerVal 305 310 315 Ile Asn Ser Asn Glu Gln Ser Thr Pro Val Gly Asn Asn AsnGly Lys 320 325 330 Pro Phe Asn Pro Phe Thr Asp Ala Ser Phe Ala Gln GlyMet Leu Ala 335 340 345 Asn Ala Ser Ala Gln Ala Lys Met Leu Asn Leu AlaHis Gln Val Gly 350 355 360 Gln Ala Ile Asn Pro Glu Asn Leu Ser Glu AsnPhe Lys Asn Phe Val 365 370 375 380 Thr Gly Phe Leu Ala Thr Cys Asn AsnLys Ser Thr Ala Gly Thr Gly 385 390 395 Gly Thr Gln Gly Ser Ala Pro GlyThr Val Thr Thr Gln Thr Phe Ala 400 405 410 Ser Gly Cys Ala Tyr Val GluGln Thr Leu Thr Asn Leu Gly Asn Ser 415 420 425 Ile Ala His Phe Gly ThrGln Glu Gln Gln Ile Gln Gln Ala Glu Asn 430 435 440 Ile Ala Asp Thr LeuVal Asn Phe Lys Ser Arg Tyr Ser Glu Leu Gly 445 450 455 460 Asn Thr TyrAsn Ser Ile Thr Thr Ala Leu Ser Lys Val Pro Asn Ala 465 470 475 Gln SerLeu Gln Asn Val Val Ser Lys Lys Asn Asn Pro Tyr Ser Pro 480 485 490 GlnGly Ile Glu Thr Asn Tyr Tyr Leu Asn Gln Asn Ser Tyr Asn Gln 495 500 505Ile Gln Thr Ile Asn Gln Glu Leu Gly Arg Asn Pro Phe Arg Lys Val 510 515520 Gly Ile Val Asn Ser Gln Thr Asn Asn Gly Ala Met Asn Gly Ile Gly 525530 535 540 Ile Gln Val Gly Tyr Lys Gln Phe Phe Gly Gln Lys Arg Lys TrpGly 545 550 555 Ala Arg Tyr Tyr Gly Phe Phe Asp Tyr Asn His Ala Phe IleLys Ser 560 565 570 Ser Phe Phe Asn Ser Ala Ser Asp Val Trp Thr Tyr GlyPhe Gly Ala 575 580 585 Asp Ala Leu Tyr Asn Phe Ile Asn Asp Lys Ala ThrAsn Phe Leu Gly 590 595 600 Lys Asn Asn Lys Leu Ser Leu Gly Leu Phe GlyGly Ile Ala Leu Ala 605 610 615 620 Gly Thr Ser Trp Leu Asn Ser Glu TyrVal Asn Leu Ala Thr Val Asn 625 630 635 Asn Val Tyr Asn Ala Lys Met AsnVal Ala Asn Phe Gln Phe Leu Phe 640 645 650 Asn Met Gly Val Arg Met AsnLeu Ala Arg Ser Lys Lys Lys Gly Ser 655 660 665 Asp His Ala Ala Gln HisGly Ile Glu Leu Gly Leu Lys Ile Pro Thr 670 675 680 Ile Asn Thr Asn TyrTyr Ser Phe Met Gly Ala Glu Leu Lys Tyr Arg 685 690 695 700 Arg Leu TyrSer Val Tyr Leu Asn Tyr Val Phe Ala Tyr 705 710 2915 base pairs nucleicacid single linear Genomic DNA Coding Sequence 365...2597 (A) NAME/KEYSignal Sequence (B) LOCATION 365...425 (D) OTHER INFORMATION 5TTTTAGGCGA CAAAATCGCT TATGTTGGGG ATAAAGGCAA CCCGCACAAT TTCGCTCACA 60AGAAATAAAC CGCTCATAAG GGGCAAACGC CCCAAAAAAG CGATTTTTAA AGAGGTTACG 120GCAAAATCAA GCTCTTTAGT ATTTAATCTT AAAAAATGCT AAAAGCCTTT TTATGGGCTA 180ACACCACACA AAAAGCATCA AAATCAAAAA AATGACAAAA TTTTTAAGAA AATGACAAAA 240AAAAACGCTT TATGCTATAA TACCCCAAAT ACATTCTAAT AGCAAATGCG TTCTAATGCA 300AATGCATTCC AATGTATGAA ATCCCTAATA CTAAATCCAA TTTAATCCAA AAAGGAGAAA 360AAAC ATG AAA AAA CAC ATC CTT TCA TTA GCT TTA GGC TCG CTT TTA GTT 409 MetLys Lys His Ile Leu Ser Leu Ala Leu Gly Ser Leu Leu Val -20 -15 -10 TCCACT TTG AGC GCT GAA GAC GAC GGC TTT TAC ACA AGC GTA GGC TAT 457 Ser ThrLeu Ser Ala Glu Asp Asp Gly Phe Tyr Thr Ser Val Gly Tyr -5 1 5 10 CAGATC GGT GAA GCC GCT CAA ATG GTA ACA AAC ACC AAA GGC ATC CAA 505 Gln IleGly Glu Ala Ala Gln Met Val Thr Asn Thr Lys Gly Ile Gln 15 20 25 CAG CTTTCA GAC AAT TAT GAA AAT TTG AAC AAC CTT TTA ACG AGA TAC 553 Gln Leu SerAsp Asn Tyr Glu Asn Leu Asn Asn Leu Leu Thr Arg Tyr 30 35 40 AGC ACC CTAAAC ACC CTT ATC AAA TTG TCC GCT GAT CCG AGC GCA ATT 601 Ser Thr Leu AsnThr Leu Ile Lys Leu Ser Ala Asp Pro Ser Ala Ile 45 50 55 AAT GCG GTG CGGGAA AAT CTG GGC GCG AGC ACG AAG AAT TTG ATC GGC 649 Asn Ala Val Arg GluAsn Leu Gly Ala Ser Thr Lys Asn Leu Ile Gly 60 65 70 75 GAT AAA GCC AACTCC CCG GCG TAT CAA GCC GTG TTT TTA GCG ATC AAC 697 Asp Lys Ala Asn SerPro Ala Tyr Gln Ala Val Phe Leu Ala Ile Asn 80 85 90 GCG GCG GTA GGG TTGTGG AAT ACC ATC GGC TAT GCG GTC ATG TGC GGG 745 Ala Ala Val Gly Leu TrpAsn Thr Ile Gly Tyr Ala Val Met Cys Gly 95 100 105 AAC GGG AAC GGC ACAGAG AGT GGG CCT GGC AGC GTG ATC TTT AAT GAC 793 Asn Gly Asn Gly Thr GluSer Gly Pro Gly Ser Val Ile Phe Asn Asp 110 115 120 CAA CCA GGA CAG GATTCC ACG CAA ATT ACT TGC AAC CGC TTT GAA TCA 841 Gln Pro Gly Gln Asp SerThr Gln Ile Thr Cys Asn Arg Phe Glu Ser 125 130 135 ACT GGG CCT GGT AAAAGC ATG TCT ATT GAT GAA TTC AAA AAA CTC AAT 889 Thr Gly Pro Gly Lys SerMet Ser Ile Asp Glu Phe Lys Lys Leu Asn 140 145 150 155 GAA GCC TAT CAAATC ATC CAG CAA GCT TTA AAA AAT CAA AGT GGG TTT 937 Glu Ala Tyr Gln IleIle Gln Gln Ala Leu Lys Asn Gln Ser Gly Phe 160 165 170 CCT GAA TTA GGCGGG AAC GGC ACA AAA GTG AGT GTT AAT TAC AAT TAC 985 Pro Glu Leu Gly GlyAsn Gly Thr Lys Val Ser Val Asn Tyr Asn Tyr 175 180 185 GAA TGC AGA CAAACT GCT GAT ATC AAC GGC GGT GTG TAT CAG TTC TGC 1033 Glu Cys Arg Gln ThrAla Asp Ile Asn Gly Gly Val Tyr Gln Phe Cys 190 195 200 AAG GCT AAA AATGGT AGT AGT AGC AGT AGT AAT GGC GGT AAT GGC AGT 1081 Lys Ala Lys Asn GlySer Ser Ser Ser Ser Asn Gly Gly Asn Gly Ser 205 210 215 AGC ACG CAA ACAACC GCG ACA ACC ACG CAA GAC GGC GTA ACG ATC ACC 1129 Ser Thr Gln Thr ThrAla Thr Thr Thr Gln Asp Gly Val Thr Ile Thr 220 225 230 235 ACT ACC TATAAT AAT AAC AAA GCC ACC GTC AAA TTT GAC ATC ACC AAT 1177 Thr Thr Tyr AsnAsn Asn Lys Ala Thr Val Lys Phe Asp Ile Thr Asn 240 245 250 AAC GCT GAACAG CTG TTA AAT CAA GCG GCA AAC ATC ATG CAA GTC CTT 1225 Asn Ala Glu GlnLeu Leu Asn Gln Ala Ala Asn Ile Met Gln Val Leu 255 260 265 AAT ACG CAATGC CCT TTA GTG CGT TCC ACG AAT AAC GAA AAC ACT CCA 1273 Asn Thr Gln CysPro Leu Val Arg Ser Thr Asn Asn Glu Asn Thr Pro 270 275 280 GGG GGT GGTCAA CCA TGG GGT TTA AGC ACA TCC GGG AAT GCG TGC AGC 1321 Gly Gly Gly GlnPro Trp Gly Leu Ser Thr Ser Gly Asn Ala Cys Ser 285 290 295 ATC TTC CAACAA GAA TTT AGC CAG GTT ACT AGC ATG ATC AAA AAC GCC 1369 Ile Phe Gln GlnGlu Phe Ser Gln Val Thr Ser Met Ile Lys Asn Ala 300 305 310 315 CAA GAAATA ATC GCG CAA AGC AAA ATC GTT AGT GAA AAC GCG CAA AAT 1417 Gln Glu IleIle Ala Gln Ser Lys Ile Val Ser Glu Asn Ala Gln Asn 320 325 330 CAA AACAAC TTG GAT ACT GGA AAA CCA TTC AAC CCT TAC ACG GAC GCC 1465 Gln Asn AsnLeu Asp Thr Gly Lys Pro Phe Asn Pro Tyr Thr Asp Ala 335 340 345 AGC TTTGCG CAA AGC ATG CTC AAA AAC GCT CAA GCG CAA GCA GAG ATG 1513 Ser Phe AlaGln Ser Met Leu Lys Asn Ala Gln Ala Gln Ala Glu Met 350 355 360 TTC AATTTG AGC GAA CAA GTG AAA AAG AAC TTG GAA GTC ATG AAA AAC 1561 Phe Asn LeuSer Glu Gln Val Lys Lys Asn Leu Glu Val Met Lys Asn 365 370 375 AAC AATAAT GTT AAC GAG AAA TTA GCA GGA TTT GGG AAA GAA GAA GTA 1609 Asn Asn AsnVal Asn Glu Lys Leu Ala Gly Phe Gly Lys Glu Glu Val 380 385 390 395 ATGACC AAT TTT GTT AGC GCC TTT TTG GCA AGC TGC AAA GAT GGT GGC 1657 Met ThrAsn Phe Val Ser Ala Phe Leu Ala Ser Cys Lys Asp Gly Gly 400 405 410 ACATTG CCT AAT GCA GGG GTT ACT TCT AAC ACT TGG GGG GCG GGT TGC 1705 Thr LeuPro Asn Ala Gly Val Thr Ser Asn Thr Trp Gly Ala Gly Cys 415 420 425 GCGTAT GTG GGA GAG ACG ATA AGC GCC CTA ACC AAC AGC ATC GCT CAC 1753 Ala TyrVal Gly Glu Thr Ile Ser Ala Leu Thr Asn Ser Ile Ala His 430 435 440 TTTGGC ACT CAA GAG CAG CAG ATA CAG CAA GCC GAA AAC ATC GCT GAC 1801 Phe GlyThr Gln Glu Gln Gln Ile Gln Gln Ala Glu Asn Ile Ala Asp 445 450 455 ACTCTA GTG AAT TTC AAA TCT AGA TAC AGC GAA TTA GGC AAC ACC TAT 1849 Thr LeuVal Asn Phe Lys Ser Arg Tyr Ser Glu Leu Gly Asn Thr Tyr 460 465 470 475AAC AGC ATC ACC ACC GCG CTC TCC AAA GTC CCT AAC GCG CAA AGC TTG 1897 AsnSer Ile Thr Thr Ala Leu Ser Lys Val Pro Asn Ala Gln Ser Leu 480 485 490CAA AAC GTG GTG AGC AAA AAG AAT AAC CCC TAT AGC CCT CAA GGC ATA 1945 GlnAsn Val Val Ser Lys Lys Asn Asn Pro Tyr Ser Pro Gln Gly Ile 495 500 505GAG ACC AAT TAC TAC CTC AAT CAA AAT TCT TAC AAC CAA ATC CAA ACC 1993 GluThr Asn Tyr Tyr Leu Asn Gln Asn Ser Tyr Asn Gln Ile Gln Thr 510 515 520ATC AAC CAA GAA CTA GGG CGT AAC CCC TTT AGG AAA GTG GGC ATC GTC 2041 IleAsn Gln Glu Leu Gly Arg Asn Pro Phe Arg Lys Val Gly Ile Val 525 530 535AAT TCT CAA ACC AAC AAT GGT GCC ATG AAT GGG ATC GGC ATT CAG GTG 2089 AsnSer Gln Thr Asn Asn Gly Ala Met Asn Gly Ile Gly Ile Gln Val 540 545 550555 GGC TAT AAG CAA TTC TTT GGC CAA AAA AGA AAA TGG GGC GCT AGG TAT 2137Gly Tyr Lys Gln Phe Phe Gly Gln Lys Arg Lys Trp Gly Ala Arg Tyr 560 565570 TAC GGC TTT TTT GAT TAC AAC CAT GCG TTC ATC AAA TCC AGC TTT TTC 2185Tyr Gly Phe Phe Asp Tyr Asn His Ala Phe Ile Lys Ser Ser Phe Phe 575 580585 AAC TCG GCT TCT GAC GTG TGG ACT TAT GGT TTT GGA GCG GAC GCG CTT 2233Asn Ser Ala Ser Asp Val Trp Thr Tyr Gly Phe Gly Ala Asp Ala Leu 590 595600 TAT AAC TTC ATC AAC GAT AAA GCC ACC AAT TTC TTA GGC AAA AAC AAC 2281Tyr Asn Phe Ile Asn Asp Lys Ala Thr Asn Phe Leu Gly Lys Asn Asn 605 610615 AAG CTT TCT TTG GGG CTT TTT GGC GGG ATT GCG TTA GCG GGC ACT TCA 2329Lys Leu Ser Leu Gly Leu Phe Gly Gly Ile Ala Leu Ala Gly Thr Ser 620 625630 635 TGG CTC AAT TCT GAG TAC GTG AAT TTA GCC ACC GTG AAT AAC GTC TAT2377 Trp Leu Asn Ser Glu Tyr Val Asn Leu Ala Thr Val Asn Asn Val Tyr 640645 650 AAC GCT AAA ATG AAT GTG GCG AAT TTC CAA TTC TTA TTC AAT ATG GGA2425 Asn Ala Lys Met Asn Val Ala Asn Phe Gln Phe Leu Phe Asn Met Gly 655660 665 GTG AGG ATG AAT TTA GCC AGA TCC AAG AAA AAA GGC AGC GAT CAT GCA2473 Val Arg Met Asn Leu Ala Arg Ser Lys Lys Lys Gly Ser Asp His Ala 670675 680 GCT CAG CAT GGG ATT GAG TTA GGG CTT AAA ATC CCC ACC ATC AAC ACG2521 Ala Gln His Gly Ile Glu Leu Gly Leu Lys Ile Pro Thr Ile Asn Thr 685690 695 AAC TAT TAT TCC TTT ATG GGG GCT GAA CTC AAA TAC AGA AGG CTC TAT2569 Asn Tyr Tyr Ser Phe Met Gly Ala Glu Leu Lys Tyr Arg Arg Leu Tyr 700705 710 715 AGC GTG TAT TTG AAT NAT GTG TTC GCT TAC TAAGCTTTTTGTGAAACTCC 2619 Ser Val Tyr Leu Asn Xaa Val Phe Ala Tyr 720 725CTTTTTAAGG GGTTTTTTTT TGAACTCTCT TTTAAATTCT CTTTTTAAAG AGATTTCTTT 2679TTTTAAGCTT TTTTTTGAAC TTTTTTTTGA ATTCTTTGTT TTTAAGCTTT TTTTAAACCC 2739TTTCGTTTTT AAACTCCCTT TTTTAAGGGA TTTCTTTTTT TGAACTCCCT TTTTTGAACC 2799CTTTTTTTTA AACCCTCTTT TTTTAAGGGG TTTCTTTTTA AAGCTTTTTT GAAGTCTTTT 2859TTTAAATTCT TTTTTTGGGG GTTTGATCTT TCTTTTTGCC AATCCCCACT ACTTTC 2915 745amino acids amino acid single linear protein internal Signal Sequence1...20 6 Met Lys Lys His Ile Leu Ser Leu Ala Leu Gly Ser Leu Leu Val Ser-20 -15 -10 -5 Thr Leu Ser Ala Glu Asp Asp Gly Phe Tyr Thr Ser Val GlyTyr Gln 1 5 10 Ile Gly Glu Ala Ala Gln Met Val Thr Asn Thr Lys Gly IleGln Gln 15 20 25 Leu Ser Asp Asn Tyr Glu Asn Leu Asn Asn Leu Leu Thr ArgTyr Ser 30 35 40 Thr Leu Asn Thr Leu Ile Lys Leu Ser Ala Asp Pro Ser AlaIle Asn 45 50 55 60 Ala Val Arg Glu Asn Leu Gly Ala Ser Thr Lys Asn LeuIle Gly Asp 65 70 75 Lys Ala Asn Ser Pro Ala Tyr Gln Ala Val Phe Leu AlaIle Asn Ala 80 85 90 Ala Val Gly Leu Trp Asn Thr Ile Gly Tyr Ala Val MetCys Gly Asn 95 100 105 Gly Asn Gly Thr Glu Ser Gly Pro Gly Ser Val IlePhe Asn Asp Gln 110 115 120 Pro Gly Gln Asp Ser Thr Gln Ile Thr Cys AsnArg Phe Glu Ser Thr 125 130 135 140 Gly Pro Gly Lys Ser Met Ser Ile AspGlu Phe Lys Lys Leu Asn Glu 145 150 155 Ala Tyr Gln Ile Ile Gln Gln AlaLeu Lys Asn Gln Ser Gly Phe Pro 160 165 170 Glu Leu Gly Gly Asn Gly ThrLys Val Ser Val Asn Tyr Asn Tyr Glu 175 180 185 Cys Arg Gln Thr Ala AspIle Asn Gly Gly Val Tyr Gln Phe Cys Lys 190 195 200 Ala Lys Asn Gly SerSer Ser Ser Ser Asn Gly Gly Asn Gly Ser Ser 205 210 215 220 Thr Gln ThrThr Ala Thr Thr Thr Gln Asp Gly Val Thr Ile Thr Thr 225 230 235 Thr TyrAsn Asn Asn Lys Ala Thr Val Lys Phe Asp Ile Thr Asn Asn 240 245 250 AlaGlu Gln Leu Leu Asn Gln Ala Ala Asn Ile Met Gln Val Leu Asn 255 260 265Thr Gln Cys Pro Leu Val Arg Ser Thr Asn Asn Glu Asn Thr Pro Gly 270 275280 Gly Gly Gln Pro Trp Gly Leu Ser Thr Ser Gly Asn Ala Cys Ser Ile 285290 295 300 Phe Gln Gln Glu Phe Ser Gln Val Thr Ser Met Ile Lys Asn AlaGln 305 310 315 Glu Ile Ile Ala Gln Ser Lys Ile Val Ser Glu Asn Ala GlnAsn Gln 320 325 330 Asn Asn Leu Asp Thr Gly Lys Pro Phe Asn Pro Tyr ThrAsp Ala Ser 335 340 345 Phe Ala Gln Ser Met Leu Lys Asn Ala Gln Ala GlnAla Glu Met Phe 350 355 360 Asn Leu Ser Glu Gln Val Lys Lys Asn Leu GluVal Met Lys Asn Asn 365 370 375 380 Asn Asn Val Asn Glu Lys Leu Ala GlyPhe Gly Lys Glu Glu Val Met 385 390 395 Thr Asn Phe Val Ser Ala Phe LeuAla Ser Cys Lys Asp Gly Gly Thr 400 405 410 Leu Pro Asn Ala Gly Val ThrSer Asn Thr Trp Gly Ala Gly Cys Ala 415 420 425 Tyr Val Gly Glu Thr IleSer Ala Leu Thr Asn Ser Ile Ala His Phe 430 435 440 Gly Thr Gln Glu GlnGln Ile Gln Gln Ala Glu Asn Ile Ala Asp Thr 445 450 455 460 Leu Val AsnPhe Lys Ser Arg Tyr Ser Glu Leu Gly Asn Thr Tyr Asn 465 470 475 Ser IleThr Thr Ala Leu Ser Lys Val Pro Asn Ala Gln Ser Leu Gln 480 485 490 AsnVal Val Ser Lys Lys Asn Asn Pro Tyr Ser Pro Gln Gly Ile Glu 495 500 505Thr Asn Tyr Tyr Leu Asn Gln Asn Ser Tyr Asn Gln Ile Gln Thr Ile 510 515520 Asn Gln Glu Leu Gly Arg Asn Pro Phe Arg Lys Val Gly Ile Val Asn 525530 535 540 Ser Gln Thr Asn Asn Gly Ala Met Asn Gly Ile Gly Ile Gln ValGly 545 550 555 Tyr Lys Gln Phe Phe Gly Gln Lys Arg Lys Trp Gly Ala ArgTyr Tyr 560 565 570 Gly Phe Phe Asp Tyr Asn His Ala Phe Ile Lys Ser SerPhe Phe Asn 575 580 585 Ser Ala Ser Asp Val Trp Thr Tyr Gly Phe Gly AlaAsp Ala Leu Tyr 590 595 600 Asn Phe Ile Asn Asp Lys Ala Thr Asn Phe LeuGly Lys Asn Asn Lys 605 610 615 620 Leu Ser Leu Gly Leu Phe Gly Gly IleAla Leu Ala Gly Thr Ser Trp 625 630 635 Leu Asn Ser Glu Tyr Val Asn LeuAla Thr Val Asn Asn Val Tyr Asn 640 645 650 Ala Lys Met Asn Val Ala AsnPhe Gln Phe Leu Phe Asn Met Gly Val 655 660 665 Arg Met Asn Leu Ala ArgSer Lys Lys Lys Gly Ser Asp His Ala Ala 670 675 680 Gln His Gly Ile GluLeu Gly Leu Lys Ile Pro Thr Ile Asn Thr Asn 685 690 695 700 Tyr Tyr SerPhe Met Gly Ala Glu Leu Lys Tyr Arg Arg Leu Tyr Ser 705 710 715 Val TyrLeu Asn Xaa Val Phe Ala Tyr 720 725 2603 base pairs nucleic acid singlelinear Genomic DNA Coding Sequence 210...2342 (A) NAME/KEY SignalSequence (B) LOCATION 210...270 (D) OTHER INFORMATION 7 ATGACCTTTATTGGTTTAAT ATTTGTTTAG AAATAACACA AAAACCTTTT TTTTTTTTTT 60 TGAAAGGGCAAAAACGCCTA ATTAATATCA AAATCCCATG AATTTATACT ATATTAACGA 120 AAGCTTGCGGTATGGTTTCA CCTAAAGACA CACTTCCGCA AGATTTACTA ACAATTTCAA 180 TCTTATTTCAAGTAATAAAA GGAGAAAAC ATG AAG AAA AAA TTT CTG TCA TTA 233 Met Lys Lys LysPhe Leu Ser Leu -20 -15 ACC TTA GGT TCG CTT TTA GTT TCC GCT TTA AGC GCTGAA GAC AAC GGC 281 Thr Leu Gly Ser Leu Leu Val Ser Ala Leu Ser Ala GluAsp Asn Gly -10 -5 1 TTT TTT GTG AGT GCG GGC TAT CAA ATC GGT GAA TCC GCTCAA ATG GTG 329 Phe Phe Val Ser Ala Gly Tyr Gln Ile Gly Glu Ser Ala GlnMet Val 5 10 15 20 AAA AAC ACT AAA GGC ATT CAA GAT CTT TCA GAT AGC TATGAA AGA CTG 377 Lys Asn Thr Lys Gly Ile Gln Asp Leu Ser Asp Ser Tyr GluArg Leu 25 30 35 AAC AAT CTT TTA ACG AGT TAT AGT GCC CTA AAC ACT CTT ATTAGG CAG 425 Asn Asn Leu Leu Thr Ser Tyr Ser Ala Leu Asn Thr Leu Ile ArgGln 40 45 50 TCC GCC GAC CCC AAC GCT ATC AAT AAC GCA AGG GGC AAT TTG AACGCT 473 Ser Ala Asp Pro Asn Ala Ile Asn Asn Ala Arg Gly Asn Leu Asn Ala55 60 65 AGT GCG AAG AAT TTG ATC AAT GAT AAA AAG AAT TCC CCG GCG TAT CAA521 Ser Ala Lys Asn Leu Ile Asn Asp Lys Lys Asn Ser Pro Ala Tyr Gln 7075 80 GCG GTG CTT TTA GCC TTG AAT GCG GCA GCG GGG TTG TGG CAA GTC ATG569 Ala Val Leu Leu Ala Leu Asn Ala Ala Ala Gly Leu Trp Gln Val Met 8590 95 100 AGC TAT TCG ATC AGC GTT TGT GGC CCT GGC TCT GAC AAA AAT AAAAAT 617 Ser Tyr Ser Ile Ser Val Cys Gly Pro Gly Ser Asp Lys Asn Lys Asn105 110 115 GGG GGC GTC CAA ACC TTT GAA AAT GTG CCG TCA AAT GGG GGG ACTACC 665 Gly Gly Val Gln Thr Phe Glu Asn Val Pro Ser Asn Gly Gly Thr Thr120 125 130 ATT GCT TGC GAT TCA TTT TAT GAA CCA GGA AAG TGG AGC GGT ATATCC 713 Ile Ala Cys Asp Ser Phe Tyr Glu Pro Gly Lys Trp Ser Gly Ile Ser135 140 145 ACT GAA AAT TAC GCA AAA ATC AAT AAA GCC TAT CAA ATC ATC CAAAAG 761 Thr Glu Asn Tyr Ala Lys Ile Asn Lys Ala Tyr Gln Ile Ile Gln Lys150 155 160 GCT TTT GGA GCA AGC GGG CAA GAT ATT CCT GCC TTA AGC GAC ACCAAA 809 Ala Phe Gly Ala Ser Gly Gln Asp Ile Pro Ala Leu Ser Asp Thr Lys165 170 175 180 GAA CTT AAT TTT GAA ATT AAA GGG AAA AAA AAT GAT AGC GTCCAG CCA 857 Glu Leu Asn Phe Glu Ile Lys Gly Lys Lys Asn Asp Ser Val GlnPro 185 190 195 GGA GAA AGA TGG AAA TTC CCA TGG ACT AAT GGA AAA TTT GTTTCA GTC 905 Gly Glu Arg Trp Lys Phe Pro Trp Thr Asn Gly Lys Phe Val SerVal 200 205 210 AAG TGG GTG AAT GGG AAG TAT GAA GAA ATT AAA GAA GAC ATCAAA GTG 953 Lys Trp Val Asn Gly Lys Tyr Glu Glu Ile Lys Glu Asp Ile LysVal 215 220 225 TCA AAT AAC GCT CAA GAG CTT TTA AAA CAG GCT AGC ACT ATTTTA ACC 1001 Ser Asn Asn Ala Gln Glu Leu Leu Lys Gln Ala Ser Thr Ile LeuThr 230 235 240 ACT CTT AAT GAA GCA TGC CCA TGG TTG AGT AAT GGT GGT GCAGGC AAT 1049 Thr Leu Asn Glu Ala Cys Pro Trp Leu Ser Asn Gly Gly Ala GlyAsn 245 250 255 260 GTG GCC GGT GGC AAT AGT TTA TGG GCC GGA ATA GAT AAAGGC GAC GGG 1097 Val Ala Gly Gly Asn Ser Leu Trp Ala Gly Ile Asp Lys GlyAsp Gly 265 270 275 AGC GCA TGC GGG ATT TTT AAA AAT GAA ATC AGC GCG ATTCAA GAC ATG 1145 Ser Ala Cys Gly Ile Phe Lys Asn Glu Ile Ser Ala Ile GlnAsp Met 280 285 290 ATC AAA AAC GCT GAA ATA GCC GTA GAG CAA TCC AAA ATCGTT ACC GCC 1193 Ile Lys Asn Ala Glu Ile Ala Val Glu Gln Ser Lys Ile ValThr Ala 295 300 305 AAC GCG CAA AAC CAG CAC AAC CTA GAC ACT GGG AAA GCATTC AAC CCC 1241 Asn Ala Gln Asn Gln His Asn Leu Asp Thr Gly Lys Ala PheAsn Pro 310 315 320 TAT AAA GAC GCC AAC TTC GCC CAA AGC ATG TTC GCT AACGCT AGA GCG 1289 Tyr Lys Asp Ala Asn Phe Ala Gln Ser Met Phe Ala Asn AlaArg Ala 325 330 335 340 CAA GCG GAG ATT TTA AAC CGC GCT CAA GCA GTG GTGAAG GAC TTT GAA 1337 Gln Ala Glu Ile Leu Asn Arg Ala Gln Ala Val Val LysAsp Phe Glu 345 350 355 AGA ATC CCT GCA GCG TTC GTG AAA GAC TCT TTA GGAGTA TGC CAT GAA 1385 Arg Ile Pro Ala Ala Phe Val Lys Asp Ser Leu Gly ValCys His Glu 360 365 370 AAG GGT AGC GAC GGC AAT CTC CGT GGC ACG CCA TCTGGC ACG GTT ACT 1433 Lys Gly Ser Asp Gly Asn Leu Arg Gly Thr Pro Ser GlyThr Val Thr 375 380 385 TCT AAC ACT TGG GGA GCC GGC TGC GCG TAT GTG GGAGAA ACC GTA ACG 1481 Ser Asn Thr Trp Gly Ala Gly Cys Ala Tyr Val Gly GluThr Val Thr 390 395 400 AAT CTA AAA AAC AGC ATC GCT CAT TTT GGC GAC CAAGCG GAG CGA ATC 1529 Asn Leu Lys Asn Ser Ile Ala His Phe Gly Asp Gln AlaGlu Arg Ile 405 410 415 420 CAT AAT GCG CGA AAT CTC GCC TAC ACT TTA GCGAAT TTC AGC GGC CAG 1577 His Asn Ala Arg Asn Leu Ala Tyr Thr Leu Ala AsnPhe Ser Gly Gln 425 430 435 TAC AAA AAG CTA GGC GAA CAC TAT GAC AGC ATCACA GCG GCG CTC TCT 1625 Tyr Lys Lys Leu Gly Glu His Tyr Asp Ser Ile ThrAla Ala Leu Ser 440 445 450 AGC TTG CCT GAT GCG CAA TCT TTA CAA AAT GTGGTG AGC AAA AAG ACT 1673 Ser Leu Pro Asp Ala Gln Ser Leu Gln Asn Val ValSer Lys Lys Thr 455 460 465 AAC CCT AAC AGC CCG CAA GGC ATA CAG GAT AATTAC TAC ATT GAC TCC 1721 Asn Pro Asn Ser Pro Gln Gly Ile Gln Asp Asn TyrTyr Ile Asp Ser 470 475 480 AAC ATC CAT TCT CAA GTG CAA TCT AGG AGT CAAGAA CTC GGC AGT AAC 1769 Asn Ile His Ser Gln Val Gln Ser Arg Ser Gln GluLeu Gly Ser Asn 485 490 495 500 CCT TTC AGA CGC GCC GGG CTA ATC GCC GCTTCT ACC ACC AAT AAC GGC 1817 Pro Phe Arg Arg Ala Gly Leu Ile Ala Ala SerThr Thr Asn Asn Gly 505 510 515 GCG ATG AAT GGG ATT GGC TTT CAA GTG GGCTAT AAG CAA TTC TTT GGG 1865 Ala Met Asn Gly Ile Gly Phe Gln Val Gly TyrLys Gln Phe Phe Gly 520 525 530 AAA AAC AAA CGA TGG GGC GCG AGA TAC TACGGC TTT GTG GAT TAC AAC 1913 Lys Asn Lys Arg Trp Gly Ala Arg Tyr Tyr GlyPhe Val Asp Tyr Asn 535 540 545 CAC ACC TAT AAC AAG TCC CAA TTT TTC AACTCC GAT TCT GAT GTT TGG 1961 His Thr Tyr Asn Lys Ser Gln Phe Phe Asn SerAsp Ser Asp Val Trp 550 555 560 ACT TAT GGC GTG GGG AGC GAT TTG TTA GTGAAT TTC ATC AAC GAT AAA 2009 Thr Tyr Gly Val Gly Ser Asp Leu Leu Val AsnPhe Ile Asn Asp Lys 565 570 575 580 GCC ACT AAA CAC AAT AAA ATT TCT TTTGGC GCG TTT GGC GGT ATC CAA 2057 Ala Thr Lys His Asn Lys Ile Ser Phe GlyAla Phe Gly Gly Ile Gln 585 590 595 CTA GCC GGG ACT TCA TGG CTT AAT TCTCAG TAT GTG AAT TTA GCG AAT 2105 Leu Ala Gly Thr Ser Trp Leu Asn Ser GlnTyr Val Asn Leu Ala Asn 600 605 610 GTG AAC AAT TAT TAT AAA GCT AAA ATCAAC ACC TCT AAC TTC CAA TTC 2153 Val Asn Asn Tyr Tyr Lys Ala Lys Ile AsnThr Ser Asn Phe Gln Phe 615 620 625 TTA TTC AAT CTG GGC TTA AGG ACC AATCTC GCC AGA AAT AAA AGA ATA 2201 Leu Phe Asn Leu Gly Leu Arg Thr Asn LeuAla Arg Asn Lys Arg Ile 630 635 640 GGC GCT GAT CAT AGC GCG CAA CAT GGCATG GAA TTA GGC GTG AAG ATC 2249 Gly Ala Asp His Ser Ala Gln His Gly MetGlu Leu Gly Val Lys Ile 645 650 655 660 CCC ACG ATC AAC ACA AAT TAC TATTCT TTG CTA GGC ACT ACC TTG CAA 2297 Pro Thr Ile Asn Thr Asn Tyr Tyr SerLeu Leu Gly Thr Thr Leu Gln 665 670 675 TAC AGA AGG CTT TAT AGC GTG TATCTC AAC TAT GTG TTT GCT TAC TAAAA 2347 Tyr Arg Arg Leu Tyr Ser Val TyrLeu Asn Tyr Val Phe Ala Tyr 680 685 690 GCTTAAACTC CTTTTTAAAC TCCCTTTTTAGGGGGTTTAA TCTTTTTAAC TGACTTTTCT 2407 TTTAGCTTTT TTTAATTTTT TCCACCAAACAAAGTTTTTT GACTTCAAGC GTTAATCACA 2467 AAAAATACTC AAAGGCGTTT TTTGCAATCTAAATAAAAAA TTAGCGTTAT TCAAGCGATC 2527 ATTTTAAACC ACCCAAGCAA GAAACCCCAAACATCTTTAG CGTTCGCGCG CTCCACTAAC 2587 CAAAAAACGC CCCAAA 2603 711 aminoacids amino acid single linear protein internal Signal Sequence 1...20 8Met Lys Lys Lys Phe Leu Ser Leu Thr Leu Gly Ser Leu Leu Val Ser -20 -15-10 -5 Ala Leu Ser Ala Glu Asp Asn Gly Phe Phe Val Ser Ala Gly Tyr Gln 15 10 Ile Gly Glu Ser Ala Gln Met Val Lys Asn Thr Lys Gly Ile Gln Asp 1520 25 Leu Ser Asp Ser Tyr Glu Arg Leu Asn Asn Leu Leu Thr Ser Tyr Ser 3035 40 Ala Leu Asn Thr Leu Ile Arg Gln Ser Ala Asp Pro Asn Ala Ile Asn 4550 55 60 Asn Ala Arg Gly Asn Leu Asn Ala Ser Ala Lys Asn Leu Ile Asn Asp65 70 75 Lys Lys Asn Ser Pro Ala Tyr Gln Ala Val Leu Leu Ala Leu Asn Ala80 85 90 Ala Ala Gly Leu Trp Gln Val Met Ser Tyr Ser Ile Ser Val Cys Gly95 100 105 Pro Gly Ser Asp Lys Asn Lys Asn Gly Gly Val Gln Thr Phe GluAsn 110 115 120 Val Pro Ser Asn Gly Gly Thr Thr Ile Ala Cys Asp Ser PheTyr Glu 125 130 135 140 Pro Gly Lys Trp Ser Gly Ile Ser Thr Glu Asn TyrAla Lys Ile Asn 145 150 155 Lys Ala Tyr Gln Ile Ile Gln Lys Ala Phe GlyAla Ser Gly Gln Asp 160 165 170 Ile Pro Ala Leu Ser Asp Thr Lys Glu LeuAsn Phe Glu Ile Lys Gly 175 180 185 Lys Lys Asn Asp Ser Val Gln Pro GlyGlu Arg Trp Lys Phe Pro Trp 190 195 200 Thr Asn Gly Lys Phe Val Ser ValLys Trp Val Asn Gly Lys Tyr Glu 205 210 215 220 Glu Ile Lys Glu Asp IleLys Val Ser Asn Asn Ala Gln Glu Leu Leu 225 230 235 Lys Gln Ala Ser ThrIle Leu Thr Thr Leu Asn Glu Ala Cys Pro Trp 240 245 250 Leu Ser Asn GlyGly Ala Gly Asn Val Ala Gly Gly Asn Ser Leu Trp 255 260 265 Ala Gly IleAsp Lys Gly Asp Gly Ser Ala Cys Gly Ile Phe Lys Asn 270 275 280 Glu IleSer Ala Ile Gln Asp Met Ile Lys Asn Ala Glu Ile Ala Val 285 290 295 300Glu Gln Ser Lys Ile Val Thr Ala Asn Ala Gln Asn Gln His Asn Leu 305 310315 Asp Thr Gly Lys Ala Phe Asn Pro Tyr Lys Asp Ala Asn Phe Ala Gln 320325 330 Ser Met Phe Ala Asn Ala Arg Ala Gln Ala Glu Ile Leu Asn Arg Ala335 340 345 Gln Ala Val Val Lys Asp Phe Glu Arg Ile Pro Ala Ala Phe ValLys 350 355 360 Asp Ser Leu Gly Val Cys His Glu Lys Gly Ser Asp Gly AsnLeu Arg 365 370 375 380 Gly Thr Pro Ser Gly Thr Val Thr Ser Asn Thr TrpGly Ala Gly Cys 385 390 395 Ala Tyr Val Gly Glu Thr Val Thr Asn Leu LysAsn Ser Ile Ala His 400 405 410 Phe Gly Asp Gln Ala Glu Arg Ile His AsnAla Arg Asn Leu Ala Tyr 415 420 425 Thr Leu Ala Asn Phe Ser Gly Gln TyrLys Lys Leu Gly Glu His Tyr 430 435 440 Asp Ser Ile Thr Ala Ala Leu SerSer Leu Pro Asp Ala Gln Ser Leu 445 450 455 460 Gln Asn Val Val Ser LysLys Thr Asn Pro Asn Ser Pro Gln Gly Ile 465 470 475 Gln Asp Asn Tyr TyrIle Asp Ser Asn Ile His Ser Gln Val Gln Ser 480 485 490 Arg Ser Gln GluLeu Gly Ser Asn Pro Phe Arg Arg Ala Gly Leu Ile 495 500 505 Ala Ala SerThr Thr Asn Asn Gly Ala Met Asn Gly Ile Gly Phe Gln 510 515 520 Val GlyTyr Lys Gln Phe Phe Gly Lys Asn Lys Arg Trp Gly Ala Arg 525 530 535 540Tyr Tyr Gly Phe Val Asp Tyr Asn His Thr Tyr Asn Lys Ser Gln Phe 545 550555 Phe Asn Ser Asp Ser Asp Val Trp Thr Tyr Gly Val Gly Ser Asp Leu 560565 570 Leu Val Asn Phe Ile Asn Asp Lys Ala Thr Lys His Asn Lys Ile Ser575 580 585 Phe Gly Ala Phe Gly Gly Ile Gln Leu Ala Gly Thr Ser Trp LeuAsn 590 595 600 Ser Gln Tyr Val Asn Leu Ala Asn Val Asn Asn Tyr Tyr LysAla Lys 605 610 615 620 Ile Asn Thr Ser Asn Phe Gln Phe Leu Phe Asn LeuGly Leu Arg Thr 625 630 635 Asn Leu Ala Arg Asn Lys Arg Ile Gly Ala AspHis Ser Ala Gln His 640 645 650 Gly Met Glu Leu Gly Val Lys Ile Pro ThrIle Asn Thr Asn Tyr Tyr 655 660 665 Ser Leu Leu Gly Thr Thr Leu Gln TyrArg Arg Leu Tyr Ser Val Tyr 670 675 680 Leu Asn Tyr Val Phe Ala Tyr 685690 2427 base pairs nucleic acid single linear Genomic DNA CodingSequence 232...2247 (A) NAME/KEY Signal Sequence (B) LOCATION 232...292(D) OTHER INFORMATION 9 AAAACGCGCA GCAAAAAATC TCTGTTAAGC TTTTATCATTAGCGTTCCAT TGAAACAAAA 60 TCTAAAAACC CTTTCCAATA CCACCCAAAC AAACGCGCAAAAAATGCAAA AATTCTAAAT 120 TTTCTCCAAA TGACAAAAAA AAAAAAAACG ATTTTATGCTACAATGCTTT TAATACATTC 180 TTACTTAATG TATAAAATCT CAATCACTCA ATTTAATTTCAAAGGATATT T ATG AAA 237 Met Lys -20 AAA ACC CTT TTA CTC TCT CTC TCT CTCTCT CTC TCG TCA TCG CTT TTA 285 Lys Thr Leu Leu Leu Ser Leu Ser Leu SerLeu Ser Ser Ser Leu Leu -15 -10 -5 AAC GCT GAA GAC AAC GGC TTT TTT ATCAGC GCG GGC TAT CAA ATC GGT 333 Asn Ala Glu Asp Asn Gly Phe Phe Ile SerAla Gly Tyr Gln Ile Gly 1 5 10 GAA GCC GCT CAA ATG GTG AAA AAC ACC GGCGAA TTG AAA AAA CTT TCA 381 Glu Ala Ala Gln Met Val Lys Asn Thr Gly GluLeu Lys Lys Leu Ser 15 20 25 30 GAC ACT TAT GAG AAT TTG AGC AAC CTT TTAACC AAT TTT AAC AAC CTC 429 Asp Thr Tyr Glu Asn Leu Ser Asn Leu Leu ThrAsn Phe Asn Asn Leu 35 40 45 AAT CAA GCG GTA ACG AAC GCG AGC AGC CCT TCAGAA ATC AAT GCC ACG 477 Asn Gln Ala Val Thr Asn Ala Ser Ser Pro Ser GluIle Asn Ala Thr 50 55 60 ATC GAT AAT TTA AAA GCA AAC ACG CAA GGG CTG ATTGGC GAA AAA ACC 525 Ile Asp Asn Leu Lys Ala Asn Thr Gln Gly Leu Ile GlyGlu Lys Thr 65 70 75 AAT TCC CCG GCG TAT CAA GCG GTG TAT TTG GCG CTC AATGCG GCG GTG 573 Asn Ser Pro Ala Tyr Gln Ala Val Tyr Leu Ala Leu Asn AlaAla Val 80 85 90 GGG CTG TGG AAT GTG ATA GCC TAT AAT GTC CAA TGC GGT CCTGGT AAG 621 Gly Leu Trp Asn Val Ile Ala Tyr Asn Val Gln Cys Gly Pro GlyLys 95 100 105 110 AGT GGG GAT CAA AGC GTA ATT TTT GAT GGC CAA CCA GGACAT GAT TCA 669 Ser Gly Asp Gln Ser Val Ile Phe Asp Gly Gln Pro Gly HisAsp Ser 115 120 125 AGA TCC ATT AAT TGC AAT TTA ACC GGT TAT AAC AAC GGGGTT AGC GGC 717 Arg Ser Ile Asn Cys Asn Leu Thr Gly Tyr Asn Asn Gly ValSer Gly 130 135 140 CCT TTA TCC ATT GAC AAT TTT AAA ACG CTT AAT CAA GCTTAT CAA ACT 765 Pro Leu Ser Ile Asp Asn Phe Lys Thr Leu Asn Gln Ala TyrGln Thr 145 150 155 ATC CAA CAA GCT TTA AAA CAA GAT AGC GGA TTT CCT GTTTTG GAT AGT 813 Ile Gln Gln Ala Leu Lys Gln Asp Ser Gly Phe Pro Val LeuAsp Ser 160 165 170 AAA GGA AAA CAA GTA ACT ATA AAA ATA ACA ACA CAA ACTAAT GGA GCT 861 Lys Gly Lys Gln Val Thr Ile Lys Ile Thr Thr Gln Thr AsnGly Ala 175 180 185 190 AAT AAA AGT GAA ACT ACT ACT ACT ACT ACT ACT ACTAAT GAC GCT CAA 909 Asn Lys Ser Glu Thr Thr Thr Thr Thr Thr Thr Thr AsnAsp Ala Gln 195 200 205 ACC CTT TTG CAA GAA GCC AGT AAA ATG ATA AGC GTCCTC ACT ACA AAC 957 Thr Leu Leu Gln Glu Ala Ser Lys Met Ile Ser Val LeuThr Thr Asn 210 215 220 TGC CCA TGG GTA AAT ACC GCT CAT AAC TCA AAC GGGGGT GCA CCG TGG 1005 Cys Pro Trp Val Asn Thr Ala His Asn Ser Asn Gly GlyAla Pro Trp 225 230 235 AAT TTA AAT ACG ACA GGG AAT GTG TGT CAG GTT TTTGCC ACG GAG TTT 1053 Asn Leu Asn Thr Thr Gly Asn Val Cys Gln Val Phe AlaThr Glu Phe 240 245 250 AGC GCC GTT ACT AGC ATG ATC AAA AAC GCG CAA GAAATC GTA ACG CAA 1101 Ser Ala Val Thr Ser Met Ile Lys Asn Ala Gln Glu IleVal Thr Gln 255 260 265 270 GCT CAA AGC CTT AAC AAC CCG CAA AGC AAT CAAAAC GCG CCG AAA GAT 1149 Ala Gln Ser Leu Asn Asn Pro Gln Ser Asn Gln AsnAla Pro Lys Asp 275 280 285 TTC AAT CCT TAC ACC TCT GCT GAT AGG GCT TTCGCT CAA AAC ATG CTC 1197 Phe Asn Pro Tyr Thr Ser Ala Asp Arg Ala Phe AlaGln Asn Met Leu 290 295 300 AAT CAC GCG CAA GCG CAA GCC AAG ATG CTT GAACTA GCC GAT CAA ATG 1245 Asn His Ala Gln Ala Gln Ala Lys Met Leu Glu LeuAla Asp Gln Met 305 310 315 AAA AAA GAC CTT AAC ACT ATC CCA AAA CAA TTTATC ACA AAC TAC TTG 1293 Lys Lys Asp Leu Asn Thr Ile Pro Lys Gln Phe IleThr Asn Tyr Leu 320 325 330 GCA GCT TGC CGC AAT GGG GGT GGG ACA TTA CCTGAT GCA GGG GTT ACT 1341 Ala Ala Cys Arg Asn Gly Gly Gly Thr Leu Pro AspAla Gly Val Thr 335 340 345 350 TCT AAC ACT TGG GGG GCC GGT TGC GCC TATGTG GAA GAG ACG ATA ACC 1389 Ser Asn Thr Trp Gly Ala Gly Cys Ala Tyr ValGlu Glu Thr Ile Thr 355 360 365 GCC CTA AAT AAC AGC CTT GCG CAT TTT GGCACT CAA GCC GAT CAA ATC 1437 Ala Leu Asn Asn Ser Leu Ala His Phe Gly ThrGln Ala Asp Gln Ile 370 375 380 AAG CAA TCT GAG TTG TTG GCG CGC ACG ATACTT GAT TTT AGA GGC AGC 1485 Lys Gln Ser Glu Leu Leu Ala Arg Thr Ile LeuAsp Phe Arg Gly Ser 385 390 395 CTT AAG GAT TTA AAC AAC ACT TAT AAC AGCATC ACC ACG ACC GCT TCA 1533 Leu Lys Asp Leu Asn Asn Thr Tyr Asn Ser IleThr Thr Thr Ala Ser 400 405 410 AAC ACG CCC AAT TCC CCA TTC CTT AAA AATTTG ATA AGC CAA TCC ACT 1581 Asn Thr Pro Asn Ser Pro Phe Leu Lys Asn LeuIle Ser Gln Ser Thr 415 420 425 430 AAC CCT AAT AAC CCC GGG GGC TTA CAGGCC GTT TAT CAA GTC AAC CAA 1629 Asn Pro Asn Asn Pro Gly Gly Leu Gln AlaVal Tyr Gln Val Asn Gln 435 440 445 AGC GCT TAT TCG CAA TTA TTA AGC GCCACG CAA GAA TTA GGG CAT AAC 1677 Ser Ala Tyr Ser Gln Leu Leu Ser Ala ThrGln Glu Leu Gly His Asn 450 455 460 CCT TTC AGA CGC GTT GGC TTA ATC AGCTCT CAA ACC AAC AAC GGT GCG 1725 Pro Phe Arg Arg Val Gly Leu Ile Ser SerGln Thr Asn Asn Gly Ala 465 470 475 ATG AAT GGG ATC GGC GTG CAA ATA GGGTAT AAA CAA TTT TTT GGT GAA 1773 Met Asn Gly Ile Gly Val Gln Ile Gly TyrLys Gln Phe Phe Gly Glu 480 485 490 AAA AGA AGA TGG GGG TTA AGG TAT TATGGT TTT TTT GAT TAC AAC CAT 1821 Lys Arg Arg Trp Gly Leu Arg Tyr Tyr GlyPhe Phe Asp Tyr Asn His 495 500 505 510 GCT TAT ATC AAA TCC AGC TTT TTCAAC TCC GCC TCT GAT GTG TTC ACT 1869 Ala Tyr Ile Lys Ser Ser Phe Phe AsnSer Ala Ser Asp Val Phe Thr 515 520 525 TAT GGG GTA GGA ACA GAT GTC CTCTAT AAC TTT ATC AAC GAT AAA GCC 1917 Tyr Gly Val Gly Thr Asp Val Leu TyrAsn Phe Ile Asn Asp Lys Ala 530 535 540 ACC AAA AAC AAT AAG ATT TCT TTTGGG GTG TTT GGG GGG ATT GCG TTA 1965 Thr Lys Asn Asn Lys Ile Ser Phe GlyVal Phe Gly Gly Ile Ala Leu 545 550 555 GCT GGC ACT TCG TGG CTT AAT TCTCAA TAC GTG AAT TTA GCG ACA TTC 2013 Ala Gly Thr Ser Trp Leu Asn Ser GlnTyr Val Asn Leu Ala Thr Phe 560 565 570 AAT AAT TTT TAC AGC GCT AAA ATGAAT GTG GCG AAT TTC CAA TTC TTA 2061 Asn Asn Phe Tyr Ser Ala Lys Met AsnVal Ala Asn Phe Gln Phe Leu 575 580 585 590 TTC AAC TTG GGC TTG AGA ATGAAT CTC GCT AAA AAC AAA AAG AAA GCG 2109 Phe Asn Leu Gly Leu Arg Met AsnLeu Ala Lys Asn Lys Lys Lys Ala 595 600 605 AGC GAT CAT GTA GCT CAG CATGGC GTG GAA CTA GGC GTG AAG ATC CCT 2157 Ser Asp His Val Ala Gln His GlyVal Glu Leu Gly Val Lys Ile Pro 610 615 620 ACG ATC AAC ACG AAT TAC TATTCT TTG CTA GGC ACT CAA CTC CAA TAC 2205 Thr Ile Asn Thr Asn Tyr Tyr SerLeu Leu Gly Thr Gln Leu Gln Tyr 625 630 635 CGC AGG CTT TAT AGC GTG TATTTG AAT TAT GTG TTT GCT TAC TAATATCTG 2256 Arg Arg Leu Tyr Ser Val TyrLeu Asn Tyr Val Phe Ala Tyr 640 645 650 TCTTTTTGTG AAACTCCCTT TTTAAGGGATTTTTTTTGAA GCCTTTCTTT TTTTAAACCC 2316 TCTTTTTTGG GGGTCAAGCG TAAAATTCACCCCTATCCCT TTAAGAAAAT AAAATAAAAG 2376 AAAATGCGTT TTATAACAAA ATAAGATCTAAAACAATAAA ACAAAAACCC A 2427 672 amino acids amino acid single linearprotein internal Signal Sequence 1...20 10 Met Lys Lys Thr Leu Leu LeuSer Leu Ser Leu Ser Leu Ser Ser Ser -20 -15 -10 -5 Leu Leu Asn Ala GluAsp Asn Gly Phe Phe Ile Ser Ala Gly Tyr Gln 1 5 10 Ile Gly Glu Ala AlaGln Met Val Lys Asn Thr Gly Glu Leu Lys Lys 15 20 25 Leu Ser Asp Thr TyrGlu Asn Leu Ser Asn Leu Leu Thr Asn Phe Asn 30 35 40 Asn Leu Asn Gln AlaVal Thr Asn Ala Ser Ser Pro Ser Glu Ile Asn 45 50 55 60 Ala Thr Ile AspAsn Leu Lys Ala Asn Thr Gln Gly Leu Ile Gly Glu 65 70 75 Lys Thr Asn SerPro Ala Tyr Gln Ala Val Tyr Leu Ala Leu Asn Ala 80 85 90 Ala Val Gly LeuTrp Asn Val Ile Ala Tyr Asn Val Gln Cys Gly Pro 95 100 105 Gly Lys SerGly Asp Gln Ser Val Ile Phe Asp Gly Gln Pro Gly His 110 115 120 Asp SerArg Ser Ile Asn Cys Asn Leu Thr Gly Tyr Asn Asn Gly Val 125 130 135 140Ser Gly Pro Leu Ser Ile Asp Asn Phe Lys Thr Leu Asn Gln Ala Tyr 145 150155 Gln Thr Ile Gln Gln Ala Leu Lys Gln Asp Ser Gly Phe Pro Val Leu 160165 170 Asp Ser Lys Gly Lys Gln Val Thr Ile Lys Ile Thr Thr Gln Thr Asn175 180 185 Gly Ala Asn Lys Ser Glu Thr Thr Thr Thr Thr Thr Thr Thr AsnAsp 190 195 200 Ala Gln Thr Leu Leu Gln Glu Ala Ser Lys Met Ile Ser ValLeu Thr 205 210 215 220 Thr Asn Cys Pro Trp Val Asn Thr Ala His Asn SerAsn Gly Gly Ala 225 230 235 Pro Trp Asn Leu Asn Thr Thr Gly Asn Val CysGln Val Phe Ala Thr 240 245 250 Glu Phe Ser Ala Val Thr Ser Met Ile LysAsn Ala Gln Glu Ile Val 255 260 265 Thr Gln Ala Gln Ser Leu Asn Asn ProGln Ser Asn Gln Asn Ala Pro 270 275 280 Lys Asp Phe Asn Pro Tyr Thr SerAla Asp Arg Ala Phe Ala Gln Asn 285 290 295 300 Met Leu Asn His Ala GlnAla Gln Ala Lys Met Leu Glu Leu Ala Asp 305 310 315 Gln Met Lys Lys AspLeu Asn Thr Ile Pro Lys Gln Phe Ile Thr Asn 320 325 330 Tyr Leu Ala AlaCys Arg Asn Gly Gly Gly Thr Leu Pro Asp Ala Gly 335 340 345 Val Thr SerAsn Thr Trp Gly Ala Gly Cys Ala Tyr Val Glu Glu Thr 350 355 360 Ile ThrAla Leu Asn Asn Ser Leu Ala His Phe Gly Thr Gln Ala Asp 365 370 375 380Gln Ile Lys Gln Ser Glu Leu Leu Ala Arg Thr Ile Leu Asp Phe Arg 385 390395 Gly Ser Leu Lys Asp Leu Asn Asn Thr Tyr Asn Ser Ile Thr Thr Thr 400405 410 Ala Ser Asn Thr Pro Asn Ser Pro Phe Leu Lys Asn Leu Ile Ser Gln415 420 425 Ser Thr Asn Pro Asn Asn Pro Gly Gly Leu Gln Ala Val Tyr GlnVal 430 435 440 Asn Gln Ser Ala Tyr Ser Gln Leu Leu Ser Ala Thr Gln GluLeu Gly 445 450 455 460 His Asn Pro Phe Arg Arg Val Gly Leu Ile Ser SerGln Thr Asn Asn 465 470 475 Gly Ala Met Asn Gly Ile Gly Val Gln Ile GlyTyr Lys Gln Phe Phe 480 485 490 Gly Glu Lys Arg Arg Trp Gly Leu Arg TyrTyr Gly Phe Phe Asp Tyr 495 500 505 Asn His Ala Tyr Ile Lys Ser Ser PhePhe Asn Ser Ala Ser Asp Val 510 515 520 Phe Thr Tyr Gly Val Gly Thr AspVal Leu Tyr Asn Phe Ile Asn Asp 525 530 535 540 Lys Ala Thr Lys Asn AsnLys Ile Ser Phe Gly Val Phe Gly Gly Ile 545 550 555 Ala Leu Ala Gly ThrSer Trp Leu Asn Ser Gln Tyr Val Asn Leu Ala 560 565 570 Thr Phe Asn AsnPhe Tyr Ser Ala Lys Met Asn Val Ala Asn Phe Gln 575 580 585 Phe Leu PheAsn Leu Gly Leu Arg Met Asn Leu Ala Lys Asn Lys Lys 590 595 600 Lys AlaSer Asp His Val Ala Gln His Gly Val Glu Leu Gly Val Lys 605 610 615 620Ile Pro Thr Ile Asn Thr Asn Tyr Tyr Ser Leu Leu Gly Thr Gln Leu 625 630635 Gln Tyr Arg Arg Leu Tyr Ser Val Tyr Leu Asn Tyr Val Phe Ala Tyr 640645 650 2429 base pairs nucleic acid single linear Genomic DNA CodingSequence 205...2277 (A) NAME/KEY Signal Sequence (B) LOCATION 205...259(D) OTHER INFORMATION 11 TGAAAGAAGA CTGATTAGTC TTTCTTTTAG GGGCGATTCAAGCCTTAAAA GCCGGGTCAA 60 AATCCCCATT TTTCCCAATT TTTACAAAAA AAAAAAAAACAAAATCTCTA AAATTTAGAG 120 CTAAAATTAG CCATAAAATT CCATTTATTG CTTATAATATGAAGTTTCTT TGTATCAAAG 180 AAAAATCTAT TAAAAGGAGA AAAC ATG AAA AAA TCC CTCTTA CTC TCT CTT 231 Met Lys Lys Ser Leu Leu Leu Ser Leu -15 -10 TCT CTCATC GCT TCC TTA TCA AGA GCT GAA GAT GAC GGA TTT TAT ACG 279 Ser Leu IleAla Ser Leu Ser Arg Ala Glu Asp Asp Gly Phe Tyr Thr -5 1 5 AGT GTG GGCTAT CAG ATC GGT GAA GCG GTC CAA CAA GTG AAA AAC ACA 327 Ser Val Gly TyrGln Ile Gly Glu Ala Val Gln Gln Val Lys Asn Thr 10 15 20 GGA GCA TTG CAAAAT CTT GCA GAC AGA TAC GAT AAC TTA AAC AAC CTT 375 Gly Ala Leu Gln AsnLeu Ala Asp Arg Tyr Asp Asn Leu Asn Asn Leu 25 30 35 TTA AAC CAA TAC AATTAT TTA AAT TCC TTA GTC AAT TTA GCC AGC ACG 423 Leu Asn Gln Tyr Asn TyrLeu Asn Ser Leu Val Asn Leu Ala Ser Thr 40 45 50 55 CCG AGC GCG ATC ACCGGT GCG ATT GAT AAT TTA AGC TCA AGC GCG ATT 471 Pro Ser Ala Ile Thr GlyAla Ile Asp Asn Leu Ser Ser Ser Ala Ile 60 65 70 AAC CTC ACT AGC GCC ACCACC ACT TCC CCC GCC TAT CAA GCT GTG GCT 519 Asn Leu Thr Ser Ala Thr ThrThr Ser Pro Ala Tyr Gln Ala Val Ala 75 80 85 TTA GCG CTC AAT GCC GCT GTGGGC ATG TGG CAA GTC ATA GCC CTT TTT 567 Leu Ala Leu Asn Ala Ala Val GlyMet Trp Gln Val Ile Ala Leu Phe 90 95 100 ATT GGC TGT GGC CCT GGC CCTACC AAT AAT CAA AGC TAT CAA TCG TTT 615 Ile Gly Cys Gly Pro Gly Pro ThrAsn Asn Gln Ser Tyr Gln Ser Phe 105 110 115 GGT AAC ACA CCA GCC CTT AATGGG ACC ACC ACC ACT TGC AAT CAA GCA 663 Gly Asn Thr Pro Ala Leu Asn GlyThr Thr Thr Thr Cys Asn Gln Ala 120 125 130 135 TAT GGG ACA GGC CCT AATGGC ATC CTA TCT ATT GAT GAA TAC CAA AAA 711 Tyr Gly Thr Gly Pro Asn GlyIle Leu Ser Ile Asp Glu Tyr Gln Lys 140 145 150 CTC AAC CAA GCT TAT CAGATC ATC CAA ACC GCT TTA AAC CAA AAT CAA 759 Leu Asn Gln Ala Tyr Gln IleIle Gln Thr Ala Leu Asn Gln Asn Gln 155 160 165 GGG GGT GGG ATG CCT GCCTTG AAT GAC ACC ACC AAA ACA GGG GTA GTC 807 Gly Gly Gly Met Pro Ala LeuAsn Asp Thr Thr Lys Thr Gly Val Val 170 175 180 AAC ATA CAA CAA ACC AATTAT AGG ACC ACC ACA CAA AAC AAT ATC ATA 855 Asn Ile Gln Gln Thr Asn TyrArg Thr Thr Thr Gln Asn Asn Ile Ile 185 190 195 GAG CAT TAT TAT ACA GAGAAT GGG AAA GAG ATC CCA GTC TCT TAT TCA 903 Glu His Tyr Tyr Thr Glu AsnGly Lys Glu Ile Pro Val Ser Tyr Ser 200 205 210 215 GGC GGA TCA TCA TTCTCG CCT ACA ATA CAA TTG ACA TAC CAT AAT AAC 951 Gly Gly Ser Ser Phe SerPro Thr Ile Gln Leu Thr Tyr His Asn Asn 220 225 230 GCT GAA AAC CTT TTGCAA CAA GCC GCC ACT ATC ATG CAA GTC CTT ATT 999 Ala Glu Asn Leu Leu GlnGln Ala Ala Thr Ile Met Gln Val Leu Ile 235 240 245 ACT CAA AAG CCG CATGTG CAA ACG AGC AAT GGC GGT AAA GCG TGG GGG 1047 Thr Gln Lys Pro His ValGln Thr Ser Asn Gly Gly Lys Ala Trp Gly 250 255 260 TTG AGT TCT ACG CCTGGG AAT GTG ATG GAT ATT TTT GGT CCT TCT TTT 1095 Leu Ser Ser Thr Pro GlyAsn Val Met Asp Ile Phe Gly Pro Ser Phe 265 270 275 AAC GCT ATT AAT GAGATG ATT AAA AAC GCT CAA ACA GCC CTA GCA AAA 1143 Asn Ala Ile Asn Glu MetIle Lys Asn Ala Gln Thr Ala Leu Ala Lys 280 285 290 295 ACC CAA CAG CTTAAC GCT AAT GAA AAC GCC CAA ATC ACG CAA CCC AAC 1191 Thr Gln Gln Leu AsnAla Asn Glu Asn Ala Gln Ile Thr Gln Pro Asn 300 305 310 AAT TTC AAC CCCTAC ACC TCT AAA GAC AAA GGG TTC GCT CAA GAA ATG 1239 Asn Phe Asn Pro TyrThr Ser Lys Asp Lys Gly Phe Ala Gln Glu Met 315 320 325 CTC AAT AGA GCTGAA GCT CAA GCA GAG ATT TTA AAT TTA GCT AAG CAA 1287 Leu Asn Arg Ala GluAla Gln Ala Glu Ile Leu Asn Leu Ala Lys Gln 330 335 340 GTA GCG AAC AATTTC CAC AGC ATT CAA GGG CCT ATT CAA GGG GAT TTA 1335 Val Ala Asn Asn PheHis Ser Ile Gln Gly Pro Ile Gln Gly Asp Leu 345 350 355 GAA GAA TGT AAAGCA GGA TCG GCT GGC GTG ATC ACT AAT AAC ACT TGG 1383 Glu Glu Cys Lys AlaGly Ser Ala Gly Val Ile Thr Asn Asn Thr Trp 360 365 370 375 GGT TCA GGTTGC GCG TTT GTG AAA GAA ACT TTA AAC TCT TTA GAG CAA 1431 Gly Ser Gly CysAla Phe Val Lys Glu Thr Leu Asn Ser Leu Glu Gln 380 385 390 CAC ACC GCTTAT TAC GGC AAC CAG GTC AAT CAG GAT AGG GCT TTG GCT 1479 His Thr Ala TyrTyr Gly Asn Gln Val Asn Gln Asp Arg Ala Leu Ala 395 400 405 CAA ACC ATTTTG AAT TTT AAA GAA GCC CTT AAC ACC CTG AAT AAA GAC 1527 Gln Thr Ile LeuAsn Phe Lys Glu Ala Leu Asn Thr Leu Asn Lys Asp 410 415 420 TCA AAA GCGATC AAT AGC GGT ATC TCC AAC TTG CCT AAC GCT AAA TCT 1575 Ser Lys Ala IleAsn Ser Gly Ile Ser Asn Leu Pro Asn Ala Lys Ser 425 430 435 CTT CAA AACATG ACG CAT GCC ACT CAA AAC CCT AAT TCC CCA GAA GGT 1623 Leu Gln Asn MetThr His Ala Thr Gln Asn Pro Asn Ser Pro Glu Gly 440 445 450 455 CTG CTCACT TAT TCT TTG GAT TCA AGC AAA TAC AAC CAG CTC CAA ACC 1671 Leu Leu ThrTyr Ser Leu Asp Ser Ser Lys Tyr Asn Gln Leu Gln Thr 460 465 470 ATC GCGCAA GAA TTG GGC AAA AAC CCT TTC AGG CGC TTT GGC GTG ATT 1719 Ile Ala GlnGlu Leu Gly Lys Asn Pro Phe Arg Arg Phe Gly Val Ile 475 480 485 GAC TTTCAA AAC AAC AAC GGC GCA ATG AAC GGG ATC GGC GTG CAA GTG 1767 Asp Phe GlnAsn Asn Asn Gly Ala Met Asn Gly Ile Gly Val Gln Val 490 495 500 GGT TATAAA CAA TTC TTT GGT AAA AAA AGG AAT TGG GGG TTA AGG TAT 1815 Gly Tyr LysGln Phe Phe Gly Lys Lys Arg Asn Trp Gly Leu Arg Tyr 505 510 515 TAT GGTTTC TTT GAT TAT AAC CAT GCT TAT ATC AAA TCT AAT TTT TTC 1863 Tyr Gly PhePhe Asp Tyr Asn His Ala Tyr Ile Lys Ser Asn Phe Phe 520 525 530 535 AACTCC GCT TCT GAT GTG TGG ACT TAT GGG GTG GGT ATG GAC GCT CTC 1911 Asn SerAla Ser Asp Val Trp Thr Tyr Gly Val Gly Met Asp Ala Leu 540 545 550 TATAAC TTC ATC AAC GAT AAA AAC ACC AAC TTT TTA GGC AAG AAC AAC 1959 Tyr AsnPhe Ile Asn Asp Lys Asn Thr Asn Phe Leu Gly Lys Asn Asn 555 560 565 AAGCTT TCA GTA GGG CTT TTT GGA GGC TTT GCG TTA GCC GGG ACT TCG 2007 Lys LeuSer Val Gly Leu Phe Gly Gly Phe Ala Leu Ala Gly Thr Ser 570 575 580 TGGCTT AAT TCC CAA CAA GTG AAT TTG ACC ATG ATG AAT GGC ATT TAT 2055 Trp LeuAsn Ser Gln Gln Val Asn Leu Thr Met Met Asn Gly Ile Tyr 585 590 595 AACGCT AAT GTC AGC ACT TCT AAC TTC CAA TTT TTG TTT GAT TTA GGC 2103 Asn AlaAsn Val Ser Thr Ser Asn Phe Gln Phe Leu Phe Asp Leu Gly 600 605 610 615TTG AGA ATG AAC CTC GCT AGG CCT AAG AAA AAA GAC AGC GAT CAT GCC 2151 LeuArg Met Asn Leu Ala Arg Pro Lys Lys Lys Asp Ser Asp His Ala 620 625 630GCT CAG CAT GGC ATT GAA CTA GGT TTT AAG ATC CCC ACG ATC AAC ACC 2199 AlaGln His Gly Ile Glu Leu Gly Phe Lys Ile Pro Thr Ile Asn Thr 635 640 645AAC TAT TAT TCT TTC ATG GGC GCT AAA CTA GAA TAC AGA AGG ATG TAT 2247 AsnTyr Tyr Ser Phe Met Gly Ala Lys Leu Glu Tyr Arg Arg Met Tyr 650 655 660AGC CTT TTT CTC AAT TAT GTG TTT GCT TAC TAAAAATTCT TTTTGAACCC CTC 2300Ser Leu Phe Leu Asn Tyr Val Phe Ala Tyr 665 670 TTTTTTTGGG GGAGTGTTGCAAAAATGCCC CCCTATTTGC TTGTGAGTTT TGGTTAAAAT 2360 TTTAGTTACC CACGCTTAAAAAGCGCCAAG CCTTTTACAC ACAACTCCTT TAATTTTGTT 2420 TTTAAGAAA 2429 691amino acids amino acid single linear protein internal Signal Sequence1...18 12 Met Lys Lys Ser Leu Leu Leu Ser Leu Ser Leu Ile Ala Ser LeuSer -15 -10 -5 Arg Ala Glu Asp Asp Gly Phe Tyr Thr Ser Val Gly Tyr GlnIle Gly 1 5 10 Glu Ala Val Gln Gln Val Lys Asn Thr Gly Ala Leu Gln AsnLeu Ala 15 20 25 30 Asp Arg Tyr Asp Asn Leu Asn Asn Leu Leu Asn Gln TyrAsn Tyr Leu 35 40 45 Asn Ser Leu Val Asn Leu Ala Ser Thr Pro Ser Ala IleThr Gly Ala 50 55 60 Ile Asp Asn Leu Ser Ser Ser Ala Ile Asn Leu Thr SerAla Thr Thr 65 70 75 Thr Ser Pro Ala Tyr Gln Ala Val Ala Leu Ala Leu AsnAla Ala Val 80 85 90 Gly Met Trp Gln Val Ile Ala Leu Phe Ile Gly Cys GlyPro Gly Pro 95 100 105 110 Thr Asn Asn Gln Ser Tyr Gln Ser Phe Gly AsnThr Pro Ala Leu Asn 115 120 125 Gly Thr Thr Thr Thr Cys Asn Gln Ala TyrGly Thr Gly Pro Asn Gly 130 135 140 Ile Leu Ser Ile Asp Glu Tyr Gln LysLeu Asn Gln Ala Tyr Gln Ile 145 150 155 Ile Gln Thr Ala Leu Asn Gln AsnGln Gly Gly Gly Met Pro Ala Leu 160 165 170 Asn Asp Thr Thr Lys Thr GlyVal Val Asn Ile Gln Gln Thr Asn Tyr 175 180 185 190 Arg Thr Thr Thr GlnAsn Asn Ile Ile Glu His Tyr Tyr Thr Glu Asn 195 200 205 Gly Lys Glu IlePro Val Ser Tyr Ser Gly Gly Ser Ser Phe Ser Pro 210 215 220 Thr Ile GlnLeu Thr Tyr His Asn Asn Ala Glu Asn Leu Leu Gln Gln 225 230 235 Ala AlaThr Ile Met Gln Val Leu Ile Thr Gln Lys Pro His Val Gln 240 245 250 ThrSer Asn Gly Gly Lys Ala Trp Gly Leu Ser Ser Thr Pro Gly Asn 255 260 265270 Val Met Asp Ile Phe Gly Pro Ser Phe Asn Ala Ile Asn Glu Met Ile 275280 285 Lys Asn Ala Gln Thr Ala Leu Ala Lys Thr Gln Gln Leu Asn Ala Asn290 295 300 Glu Asn Ala Gln Ile Thr Gln Pro Asn Asn Phe Asn Pro Tyr ThrSer 305 310 315 Lys Asp Lys Gly Phe Ala Gln Glu Met Leu Asn Arg Ala GluAla Gln 320 325 330 Ala Glu Ile Leu Asn Leu Ala Lys Gln Val Ala Asn AsnPhe His Ser 335 340 345 350 Ile Gln Gly Pro Ile Gln Gly Asp Leu Glu GluCys Lys Ala Gly Ser 355 360 365 Ala Gly Val Ile Thr Asn Asn Thr Trp GlySer Gly Cys Ala Phe Val 370 375 380 Lys Glu Thr Leu Asn Ser Leu Glu GlnHis Thr Ala Tyr Tyr Gly Asn 385 390 395 Gln Val Asn Gln Asp Arg Ala LeuAla Gln Thr Ile Leu Asn Phe Lys 400 405 410 Glu Ala Leu Asn Thr Leu AsnLys Asp Ser Lys Ala Ile Asn Ser Gly 415 420 425 430 Ile Ser Asn Leu ProAsn Ala Lys Ser Leu Gln Asn Met Thr His Ala 435 440 445 Thr Gln Asn ProAsn Ser Pro Glu Gly Leu Leu Thr Tyr Ser Leu Asp 450 455 460 Ser Ser LysTyr Asn Gln Leu Gln Thr Ile Ala Gln Glu Leu Gly Lys 465 470 475 Asn ProPhe Arg Arg Phe Gly Val Ile Asp Phe Gln Asn Asn Asn Gly 480 485 490 AlaMet Asn Gly Ile Gly Val Gln Val Gly Tyr Lys Gln Phe Phe Gly 495 500 505510 Lys Lys Arg Asn Trp Gly Leu Arg Tyr Tyr Gly Phe Phe Asp Tyr Asn 515520 525 His Ala Tyr Ile Lys Ser Asn Phe Phe Asn Ser Ala Ser Asp Val Trp530 535 540 Thr Tyr Gly Val Gly Met Asp Ala Leu Tyr Asn Phe Ile Asn AspLys 545 550 555 Asn Thr Asn Phe Leu Gly Lys Asn Asn Lys Leu Ser Val GlyLeu Phe 560 565 570 Gly Gly Phe Ala Leu Ala Gly Thr Ser Trp Leu Asn SerGln Gln Val 575 580 585 590 Asn Leu Thr Met Met Asn Gly Ile Tyr Asn AlaAsn Val Ser Thr Ser 595 600 605 Asn Phe Gln Phe Leu Phe Asp Leu Gly LeuArg Met Asn Leu Ala Arg 610 615 620 Pro Lys Lys Lys Asp Ser Asp His AlaAla Gln His Gly Ile Glu Leu 625 630 635 Gly Phe Lys Ile Pro Thr Ile AsnThr Asn Tyr Tyr Ser Phe Met Gly 640 645 650 Ala Lys Leu Glu Tyr Arg ArgMet Tyr Ser Leu Phe Leu Asn Tyr Val 655 660 665 670 Phe Ala Tyr 2270base pairs nucleic acid single linear Genomic DNA Coding Sequence130...2049 (A) NAME/KEY Signal Sequence (B) LOCATION 130...193 (D) OTHERINFORMATION 13 ATTGAGCGCA TCAAAACACC CTAAAACTTT TTTGAAATCC AATAAATTTATGTTATAATT 60 AAACGCATTG TAAATAAATT CTCATTTTGA TACATTTTTA CAATAAAACATTACTTTAAG 120 GAACATCTT ATG AAA AAA ACG AAA AAA ACG ATT CTG CTT TCT CTAACT CTC 171 Met Lys Lys Thr Lys Lys Thr Ile Leu Leu Ser Leu Thr Leu -20-15 -10 GCG GCG TCA TTG CTC CAT GCT GAA GAC AAC GGC GTT TTT TTA AGC GTG219 Ala Ala Ser Leu Leu His Ala Glu Asp Asn Gly Val Phe Leu Ser Val -5 15 GGT TAT CAA ATC GGT GAA GCG GTT CAA AAA GTG AAA AAC GCC GAC AAG 267Gly Tyr Gln Ile Gly Glu Ala Val Gln Lys Val Lys Asn Ala Asp Lys 10 15 2025 GTG CAA AAA CTT TCA GAC ACT TAT GAA CAA TTA AGC CGG CTT TTA ACC 315Val Gln Lys Leu Ser Asp Thr Tyr Glu Gln Leu Ser Arg Leu Leu Thr 30 35 40AAC GAT AAT GGC ACA AAC TCA AAG ACA AGC GCG CAA ATC AAC CAA GCG 363 AsnAsp Asn Gly Thr Asn Ser Lys Thr Ser Ala Gln Ile Asn Gln Ala 45 50 55 GTTAAT AAT TTG AAC GAA CGC GCA AAA ACT TTA GCC GGT GGG ACA ACC 411 Val AsnAsn Leu Asn Glu Arg Ala Lys Thr Leu Ala Gly Gly Thr Thr 60 65 70 AAT TCCCCT GCC TAT CAA GCC ACG CTT TTA GCG TTG AGA TCG GTG TTA 459 Asn Ser ProAla Tyr Gln Ala Thr Leu Leu Ala Leu Arg Ser Val Leu 75 80 85 GGG CTA TGGAAT AGC ATG GGT TAT GCG GTC ATA TGC GGA GGT TAT ACC 507 Gly Leu Trp AsnSer Met Gly Tyr Ala Val Ile Cys Gly Gly Tyr Thr 90 95 100 105 AAA AGTCCA GGC GAA AAC AAT CAA AAA GAT TTC CAC TAC ACC GAT GAG 555 Lys Ser ProGly Glu Asn Asn Gln Lys Asp Phe His Tyr Thr Asp Glu 110 115 120 AAT GGCAAT GGC ACT ACA ATC AAT TGC GGT GGG AGC ACA AAT AGT AAT 603 Asn Gly AsnGly Thr Thr Ile Asn Cys Gly Gly Ser Thr Asn Ser Asn 125 130 135 GGC ACTCAT AGT TCT AGT GGC ACA AAT ACA TTA AAA GCA GAC AAA AAT 651 Gly Thr HisSer Ser Ser Gly Thr Asn Thr Leu Lys Ala Asp Lys Asn 140 145 150 GTT TCTCTA TCT ATT GAG CAA TAT GAA AAA ATC CAT GAA GCT TAT CAG 699 Val Ser LeuSer Ile Glu Gln Tyr Glu Lys Ile His Glu Ala Tyr Gln 155 160 165 ATT CTTTCA AAA GCT TTA AAA CAA GCC GGG CTT GCT CCT TTA AAT AGC 747 Ile Leu SerLys Ala Leu Lys Gln Ala Gly Leu Ala Pro Leu Asn Ser 170 175 180 185 AAAGGG GAA AAG TTA GAA GCG CAT GTA ACC ACA TCA AAA CCA GAA AAT 795 Lys GlyGlu Lys Leu Glu Ala His Val Thr Thr Ser Lys Pro Glu Asn 190 195 200 AATAGT CAA ACT AAA ACG ACA ACT TCT GTT ATT GAT ACG ACT AAT GAT 843 Asn SerGln Thr Lys Thr Thr Thr Ser Val Ile Asp Thr Thr Asn Asp 205 210 215 GCGCAA AAT CTT TTG ACT CAA GCG CAA ACG ATT GTC AAT ACC CTT AAA 891 Ala GlnAsn Leu Leu Thr Gln Ala Gln Thr Ile Val Asn Thr Leu Lys 220 225 230 GATTAT TGC CCC ATG TTG ATA GCG AAA TCT AGT AGT GAA AGT AGT GGC 939 Asp TyrCys Pro Met Leu Ile Ala Lys Ser Ser Ser Glu Ser Ser Gly 235 240 245 GCAGCT ACT ACA AAC GCC CCT TCA TGG CAA ACA GCC GGT GGC GGC AAA 987 Ala AlaThr Thr Asn Ala Pro Ser Trp Gln Thr Ala Gly Gly Gly Lys 250 255 260 265AAT TCA TGT GCG ACT TTT GGT GCG GAG TTT AGT GCC GCT TCA GAC ATG 1035 AsnSer Cys Ala Thr Phe Gly Ala Glu Phe Ser Ala Ala Ser Asp Met 270 275 280ATT AAT AAT GCG CAA AAA ATC GTT CAA GAA ACC CAA CAA CTC AGC GCC 1083 IleAsn Asn Ala Gln Lys Ile Val Gln Glu Thr Gln Gln Leu Ser Ala 285 290 295AAC CAA CCA AAA AAT ATC ACA CAA CCC CAT AAT CTC AAC CTT AAC ACC 1131 AsnGln Pro Lys Asn Ile Thr Gln Pro His Asn Leu Asn Leu Asn Thr 300 305 310CCT AGC AGT CTT ACG GCT TTA GCT CAA AAA ATG CTC AAA AAT GCG CAA 1179 ProSer Ser Leu Thr Ala Leu Ala Gln Lys Met Leu Lys Asn Ala Gln 315 320 325TCT CAA GCA GAA ATT TTA AAA CTA GCC AAT CAA GTG GAG AGC GAT TTT 1227 SerGln Ala Glu Ile Leu Lys Leu Ala Asn Gln Val Glu Ser Asp Phe 330 335 340345 AAC AAA CTT TCT TCA GGC CAT CTT AAA GAC TAC ATA GGG AAA TGC GAT 1275Asn Lys Leu Ser Ser Gly His Leu Lys Asp Tyr Ile Gly Lys Cys Asp 350 355360 GCG AGC GCT ATA AGC AGT GCG AAT ATG ACA ATG CAA AAT CAA AAG AAC 1323Ala Ser Ala Ile Ser Ser Ala Asn Met Thr Met Gln Asn Gln Lys Asn 365 370375 AAT TGG GGG AAC GGG TGT GCT GGC GTG GAA GAA ACT CTG TCT TCA TTA 1371Asn Trp Gly Asn Gly Cys Ala Gly Val Glu Glu Thr Leu Ser Ser Leu 380 385390 AAA ACA AGT GCC GCT GAT TTT AAC AAC CAA ACG CCA CAA ATC AAT CAA 1419Lys Thr Ser Ala Ala Asp Phe Asn Asn Gln Thr Pro Gln Ile Asn Gln 395 400405 GCG CAA AAC CTA GCC AAC ACC CTT ATT CAA GAA CTT GGC AAC AAC CCT 1467Ala Gln Asn Leu Ala Asn Thr Leu Ile Gln Glu Leu Gly Asn Asn Pro 410 415420 425 TTT AGG AAT ATG GGC ATG ATC GCT TCT TCA ACC ACG AAT AAC GGC GCC1515 Phe Arg Asn Met Gly Met Ile Ala Ser Ser Thr Thr Asn Asn Gly Ala 430435 440 TTG AAT GGC CTT GGG GTG CAA GTG GGT TAT AAG CAA TTT TTT GGG GAA1563 Leu Asn Gly Leu Gly Val Gln Val Gly Tyr Lys Gln Phe Phe Gly Glu 445450 455 AAG AAA AGA TGG GGG TTA AGG TAT TAT GGT TTC TTT GAT TAC AAC CAC1611 Lys Lys Arg Trp Gly Leu Arg Tyr Tyr Gly Phe Phe Asp Tyr Asn His 460465 470 GCC TAT ATC AAA TCC AAT TTC TTT AAC TCG GCT TCT GAT GTG TGG ACT1659 Ala Tyr Ile Lys Ser Asn Phe Phe Asn Ser Ala Ser Asp Val Trp Thr 475480 485 TAT GGG GTG GGC AGC GAT TTA TTG TTT AAT TTC ATC AAT GAT AAA AAC1707 Tyr Gly Val Gly Ser Asp Leu Leu Phe Asn Phe Ile Asn Asp Lys Asn 490495 500 505 ACC AAC TTT TTA GGC AAG AAT AAC AAG ATT TCA GTG GGA TTT TTTGGA 1755 Thr Asn Phe Leu Gly Lys Asn Asn Lys Ile Ser Val Gly Phe Phe Gly510 515 520 GGT ATC GCC TTA GCA GGG ACT TCA TGG CTT AAT TCT CAA TTC GTGAAT 1803 Gly Ile Ala Leu Ala Gly Thr Ser Trp Leu Asn Ser Gln Phe Val Asn525 530 535 TTA AAA ACC ATC AGC AAT GTT TAT AGC GCT AAA GTG AAT ACG GCTAAC 1851 Leu Lys Thr Ile Ser Asn Val Tyr Ser Ala Lys Val Asn Thr Ala Asn540 545 550 TTC CAA TTT TTA TTC AAT TTG GGC TTG AGA ACC AAT CTC GCT AGACCT 1899 Phe Gln Phe Leu Phe Asn Leu Gly Leu Arg Thr Asn Leu Ala Arg Pro555 560 565 AAG AAA AAA GAT AGT CAT CAT GCG GCT CAA CAT GGC ATG GAA TTGGGC 1947 Lys Lys Lys Asp Ser His His Ala Ala Gln His Gly Met Glu Leu Gly570 575 580 585 GTG AAA ATC CCT ACC ATT AAC ACG AAT TAT TAT TCT TTT CTAGAC ACT 1995 Val Lys Ile Pro Thr Ile Asn Thr Asn Tyr Tyr Ser Phe Leu AspThr 590 595 600 AAA CTA GAA TAT CGA AGG CTT TAT AGC GTG TAT CTC AAT TATGTG TTT 2043 Lys Leu Glu Tyr Arg Arg Leu Tyr Ser Val Tyr Leu Asn Tyr ValPhe 605 610 615 GCC TAT TAAAAACCCT CTTTTTAAAA AAGGGGGGGC TTTAAAAAACCTCTAAAGAT AA 2101 Ala Tyr AAATTTTCAA AAAACAATCA TTAAACCCTA AAAAAGAAATTTTAAGGTAT AATGCTTTCG 2161 CCATTTTTAA TTTTCCATGG CAAACTCCTT TTTAGAATTTATCCCCATAA TCGCTCTTAT 2221 GGGGCGTTTG TTTTGCAACA ATCTTTTCGA AACTATCCAACAAGCTTTA 2270 640 amino acids amino acid single linear protein internalSignal Sequence 1...21 14 Met Lys Lys Thr Lys Lys Thr Ile Leu Leu SerLeu Thr Leu Ala Ala -20 -15 -10 Ser Leu Leu His Ala Glu Asp Asn Gly ValPhe Leu Ser Val Gly Tyr -5 1 5 10 Gln Ile Gly Glu Ala Val Gln Lys ValLys Asn Ala Asp Lys Val Gln 15 20 25 Lys Leu Ser Asp Thr Tyr Glu Gln LeuSer Arg Leu Leu Thr Asn Asp 30 35 40 Asn Gly Thr Asn Ser Lys Thr Ser AlaGln Ile Asn Gln Ala Val Asn 45 50 55 Asn Leu Asn Glu Arg Ala Lys Thr LeuAla Gly Gly Thr Thr Asn Ser 60 65 70 75 Pro Ala Tyr Gln Ala Thr Leu LeuAla Leu Arg Ser Val Leu Gly Leu 80 85 90 Trp Asn Ser Met Gly Tyr Ala ValIle Cys Gly Gly Tyr Thr Lys Ser 95 100 105 Pro Gly Glu Asn Asn Gln LysAsp Phe His Tyr Thr Asp Glu Asn Gly 110 115 120 Asn Gly Thr Thr Ile AsnCys Gly Gly Ser Thr Asn Ser Asn Gly Thr 125 130 135 His Ser Ser Ser GlyThr Asn Thr Leu Lys Ala Asp Lys Asn Val Ser 140 145 150 155 Leu Ser IleGlu Gln Tyr Glu Lys Ile His Glu Ala Tyr Gln Ile Leu 160 165 170 Ser LysAla Leu Lys Gln Ala Gly Leu Ala Pro Leu Asn Ser Lys Gly 175 180 185 GluLys Leu Glu Ala His Val Thr Thr Ser Lys Pro Glu Asn Asn Ser 190 195 200Gln Thr Lys Thr Thr Thr Ser Val Ile Asp Thr Thr Asn Asp Ala Gln 205 210215 Asn Leu Leu Thr Gln Ala Gln Thr Ile Val Asn Thr Leu Lys Asp Tyr 220225 230 235 Cys Pro Met Leu Ile Ala Lys Ser Ser Ser Glu Ser Ser Gly AlaAla 240 245 250 Thr Thr Asn Ala Pro Ser Trp Gln Thr Ala Gly Gly Gly LysAsn Ser 255 260 265 Cys Ala Thr Phe Gly Ala Glu Phe Ser Ala Ala Ser AspMet Ile Asn 270 275 280 Asn Ala Gln Lys Ile Val Gln Glu Thr Gln Gln LeuSer Ala Asn Gln 285 290 295 Pro Lys Asn Ile Thr Gln Pro His Asn Leu AsnLeu Asn Thr Pro Ser 300 305 310 315 Ser Leu Thr Ala Leu Ala Gln Lys MetLeu Lys Asn Ala Gln Ser Gln 320 325 330 Ala Glu Ile Leu Lys Leu Ala AsnGln Val Glu Ser Asp Phe Asn Lys 335 340 345 Leu Ser Ser Gly His Leu LysAsp Tyr Ile Gly Lys Cys Asp Ala Ser 350 355 360 Ala Ile Ser Ser Ala AsnMet Thr Met Gln Asn Gln Lys Asn Asn Trp 365 370 375 Gly Asn Gly Cys AlaGly Val Glu Glu Thr Leu Ser Ser Leu Lys Thr 380 385 390 395 Ser Ala AlaAsp Phe Asn Asn Gln Thr Pro Gln Ile Asn Gln Ala Gln 400 405 410 Asn LeuAla Asn Thr Leu Ile Gln Glu Leu Gly Asn Asn Pro Phe Arg 415 420 425 AsnMet Gly Met Ile Ala Ser Ser Thr Thr Asn Asn Gly Ala Leu Asn 430 435 440Gly Leu Gly Val Gln Val Gly Tyr Lys Gln Phe Phe Gly Glu Lys Lys 445 450455 Arg Trp Gly Leu Arg Tyr Tyr Gly Phe Phe Asp Tyr Asn His Ala Tyr 460465 470 475 Ile Lys Ser Asn Phe Phe Asn Ser Ala Ser Asp Val Trp Thr TyrGly 480 485 490 Val Gly Ser Asp Leu Leu Phe Asn Phe Ile Asn Asp Lys AsnThr Asn 495 500 505 Phe Leu Gly Lys Asn Asn Lys Ile Ser Val Gly Phe PheGly Gly Ile 510 515 520 Ala Leu Ala Gly Thr Ser Trp Leu Asn Ser Gln PheVal Asn Leu Lys 525 530 535 Thr Ile Ser Asn Val Tyr Ser Ala Lys Val AsnThr Ala Asn Phe Gln 540 545 550 555 Phe Leu Phe Asn Leu Gly Leu Arg ThrAsn Leu Ala Arg Pro Lys Lys 560 565 570 Lys Asp Ser His His Ala Ala GlnHis Gly Met Glu Leu Gly Val Lys 575 580 585 Ile Pro Thr Ile Asn Thr AsnTyr Tyr Ser Phe Leu Asp Thr Lys Leu 590 595 600 Glu Tyr Arg Arg Leu TyrSer Val Tyr Leu Asn Tyr Val Phe Ala Tyr 605 610 615 2248 base pairsnucleic acid single linear Genomic DNA Coding Sequence 173...2128 (A)NAME/KEY Signal Sequence (B) LOCATION 173...224 (D) OTHER INFORMATION 15TGGTTTTATC GTTACAAAAT TCAACATTTC AAAGATAAAT AAGTTAAAAT ACCCCAAAAT 60CTTTTTTTTT TTTTTGAAAT CCAATCAATT TATAGTAAAA TTAGGTTCAT TGTAAATATA 120TTATCACTTC ATGATATTCT TACAACAAAA ACATTACTTT AAGGAACATT TT ATG AAA 178Met Lys AAG ACA ATT CTG CTC TCT CTC TCT GCT TCA TCG CTC TTG CAC GCT GAA226 Lys Thr Ile Leu Leu Ser Leu Ser Ala Ser Ser Leu Leu His Ala Glu -15-10 -5 1 GAC AAC GGC TTT TTT GTG AGC GCC GGC TAT CAA ATC GGC GAA GCG GTG274 Asp Asn Gly Phe Phe Val Ser Ala Gly Tyr Gln Ile Gly Glu Ala Val 5 1015 CAA ATG GTC AAA AAC ACC GGT GAA TTG AAA AAC TTG AAC GAA AAA TAC 322Gln Met Val Lys Asn Thr Gly Glu Leu Lys Asn Leu Asn Glu Lys Tyr 20 25 30GAG CAA TTA AGC CAG TAT TTA AAT CAA GTG GCT TCG TTG AAG CAA AGC 370 GluGln Leu Ser Gln Tyr Leu Asn Gln Val Ala Ser Leu Lys Gln Ser 35 40 45 ATTCAA AAC GCC AAC AAC ATT GAG CTG GTC AAT AGC TCT TTA AAC TAT 418 Ile GlnAsn Ala Asn Asn Ile Glu Leu Val Asn Ser Ser Leu Asn Tyr 50 55 60 65 TTAAAA AGC TTT ACC AAC AAC AAC TAT AAC AGC ACC ACC CAA TCG CCC 466 Leu LysSer Phe Thr Asn Asn Asn Tyr Asn Ser Thr Thr Gln Ser Pro 70 75 80 ATC TTTAAT GCC GTG CAA GCC GTT ATC ACT TCG GTA TTG GGT TTT TGG 514 Ile Phe AsnAla Val Gln Ala Val Ile Thr Ser Val Leu Gly Phe Trp 85 90 95 AGT CTT TATGCG GGG AAT TAC TTC ACT TTT TTT GTG GGT AAA AAG GTG 562 Ser Leu Tyr AlaGly Asn Tyr Phe Thr Phe Phe Val Gly Lys Lys Val 100 105 110 GGT GAT AGTGGG CAA CCC GCT AGT GTC CAG GGT AAC CCT CCT TTT AAA 610 Gly Asp Ser GlyGln Pro Ala Ser Val Gln Gly Asn Pro Pro Phe Lys 115 120 125 ACG ATT ATAGAG AAC TGC TCA GGA ATT GAA AAC TGC GCT ATG GAT CAA 658 Thr Ile Ile GluAsn Cys Ser Gly Ile Glu Asn Cys Ala Met Asp Gln 130 135 140 145 ACC ACTTAT GAT AAG ATG AAA AAA CTC GCT GAA GAC CTC CAA GCG GCT 706 Thr Thr TyrAsp Lys Met Lys Lys Leu Ala Glu Asp Leu Gln Ala Ala 150 155 160 CAA ACAAAC TCT GCC ACT AAA GGC AAC AAT CTT TGC GCT TTA TCC GGG 754 Gln Thr AsnSer Ala Thr Lys Gly Asn Asn Leu Cys Ala Leu Ser Gly 165 170 175 TGT GCTGCA ACA GAC TCA ACA TCA AAC CCA CCA AAC TCA ACC GTG AGC 802 Cys Ala AlaThr Asp Ser Thr Ser Asn Pro Pro Asn Ser Thr Val Ser 180 185 190 AAC GCTCTT AAT TTG GCG CAA CAG CTT ATG GAT TTA ATC GCA AAC ACT 850 Asn Ala LeuAsn Leu Ala Gln Gln Leu Met Asp Leu Ile Ala Asn Thr 195 200 205 AAA ACGGCT ATG ATG TGG AAA AAT ATC GTC ATC AGT GGC GTT TCA AAC 898 Lys Thr AlaMet Met Trp Lys Asn Ile Val Ile Ser Gly Val Ser Asn 210 215 220 225 ACATCC GGT GCT ATC ACA TCC ACT AAT TAC CCA ACG CAA TAC GCG GTG 946 Thr SerGly Ala Ile Thr Ser Thr Asn Tyr Pro Thr Gln Tyr Ala Val 230 235 240 TTTAAC AAC ATT AAG GCG ATG ATA CCC ATT TTG CAA CAA GCG GTT ACG 994 Phe AsnAsn Ile Lys Ala Met Ile Pro Ile Leu Gln Gln Ala Val Thr 245 250 255 CTTTCT CAA AGC AAC CAC ACC CTA TCT GCT AGC TTG CAA GCT CAA GCC 1042 Leu SerGln Ser Asn His Thr Leu Ser Ala Ser Leu Gln Ala Gln Ala 260 265 270 ACAGGA TCT CAA ACA AAC CCT AAA TTC GCT AAA GAC ATC TAC ACT TTC 1090 Thr GlySer Gln Thr Asn Pro Lys Phe Ala Lys Asp Ile Tyr Thr Phe 275 280 285 GCTCAA AAC CAA AAG CAA GTC ATC TCT TAC GCT CAA GAC ATT TTC AAC 1138 Ala GlnAsn Gln Lys Gln Val Ile Ser Tyr Ala Gln Asp Ile Phe Asn 290 295 300 305CTC TTT AAT TCT ATC CCT GCA GAG CAG TAT AAG TAT CTA GAG AAA GCT 1186 LeuPhe Asn Ser Ile Pro Ala Glu Gln Tyr Lys Tyr Leu Glu Lys Ala 310 315 320TAC TTG AAA ATA CCC AAT GCG GGT TCA ACG CCT ACT AAC CCT TAC AGA 1234 TyrLeu Lys Ile Pro Asn Ala Gly Ser Thr Pro Thr Asn Pro Tyr Arg 325 330 335CAA GTG GTG AAT TTA AAC CAA GAA GTT CAG ACG ATT AAA AAC AAT GTG 1282 GlnVal Val Asn Leu Asn Gln Glu Val Gln Thr Ile Lys Asn Asn Val 340 345 350AGT TAT TAT GGT AAC CGG GTG GAT GCG GCT TTA AGC GTG GCT AGA GAT 1330 SerTyr Tyr Gly Asn Arg Val Asp Ala Ala Leu Ser Val Ala Arg Asp 355 360 365GTT TAT AAC CTA AAA TCC AAT CAA GCA GAA ATC GTA ACC GCC TAT AAC 1378 ValTyr Asn Leu Lys Ser Asn Gln Ala Glu Ile Val Thr Ala Tyr Asn 370 375 380385 GAC GCT AAG ACT TTG AGC GAA GAG ATT TCT AAA CTC CCG CAC AAT CAA 1426Asp Ala Lys Thr Leu Ser Glu Glu Ile Ser Lys Leu Pro His Asn Gln 390 395400 GTC AAT ACA AAA GAC ATT GTT ACA CTA CCT TAC GAT AAA AAC GCC CCA 1474Val Asn Thr Lys Asp Ile Val Thr Leu Pro Tyr Asp Lys Asn Ala Pro 405 410415 GCA GCA GGC CAA TCC AAC TAC CAA ATC AAC CCA GAG CAG CAA TCC AAT 1522Ala Ala Gly Gln Ser Asn Tyr Gln Ile Asn Pro Glu Gln Gln Ser Asn 420 425430 CTT AAC CAA GCT TTA GCA GCG ATG AGC AAT AAC CCC TTT AAA AAA GTG 1570Leu Asn Gln Ala Leu Ala Ala Met Ser Asn Asn Pro Phe Lys Lys Val 435 440445 GGC ATG ATC AGC TCT CAA AAC AAT AAC GGC GCT TTG AAC GGG CTT GGC 1618Gly Met Ile Ser Ser Gln Asn Asn Asn Gly Ala Leu Asn Gly Leu Gly 450 455460 465 GTG CAA GTG GGT TAT AAG CAA TTC TTT GGC GAA AGC AAA AGA TGG GGG1666 Val Gln Val Gly Tyr Lys Gln Phe Phe Gly Glu Ser Lys Arg Trp Gly 470475 480 TTA AGG TAT TAC GGA TTC TTT GAT TAC AAC CAC GGC TAC ATC AAA TCC1714 Leu Arg Tyr Tyr Gly Phe Phe Asp Tyr Asn His Gly Tyr Ile Lys Ser 485490 495 AGC TTC TTT AAC TCT TCT TCT GAT ATA TGG ACT TAT GGC GGT GGG AGC1762 Ser Phe Phe Asn Ser Ser Ser Asp Ile Trp Thr Tyr Gly Gly Gly Ser 500505 510 GAT TTG TTA GTG AAT ATT ATC AAC GAT AGC ATC ACA AGA AAG AAC AAC1810 Asp Leu Leu Val Asn Ile Ile Asn Asp Ser Ile Thr Arg Lys Asn Asn 515520 525 AAG CTC TCC GTG GGT CTT TTT GGA GGC ATC CAA CTA GCA GGG ACT ACA1858 Lys Leu Ser Val Gly Leu Phe Gly Gly Ile Gln Leu Ala Gly Thr Thr 530535 540 545 TGG CTT AAT TCT CAA TAC GTG AAT TTA ACC GCG TTC AAT AAC CCTTAC 1906 Trp Leu Asn Ser Gln Tyr Val Asn Leu Thr Ala Phe Asn Asn Pro Tyr550 555 560 AGC GCG AAA GTC AAT GCT ACC AAT TTC CAA TTC TTG TTC AAT CTCGGC 1954 Ser Ala Lys Val Asn Ala Thr Asn Phe Gln Phe Leu Phe Asn Leu Gly565 570 575 TTG AGG ACG AAT CTC GCT ACA GCT AGG AAA AAA GAC AGC GAA CATTCC 2002 Leu Arg Thr Asn Leu Ala Thr Ala Arg Lys Lys Asp Ser Glu His Ser580 585 590 GCG CAA CAT GGC ATT GAA TTG GGT ATT AAA ATC CCC ACC ATT ACCACG 2050 Ala Gln His Gly Ile Glu Leu Gly Ile Lys Ile Pro Thr Ile Thr Thr595 600 605 AAT TAC TAT TCT TTT CTA GGC ACT CAA TTG CAA TAC AGA AGG CTCTAT 2098 Asn Tyr Tyr Ser Phe Leu Gly Thr Gln Leu Gln Tyr Arg Arg Leu Tyr610 615 620 625 AGC GTG TAT CTC AAT TAT GTG TTC GCT TAC TGAGTGATTCAAGCTCTCTT CTT 2151 Ser Val Tyr Leu Asn Tyr Val Phe Ala Tyr 630 635TAAGGGGGTT TAGAAAAATC GCAACGCCAA GCTTTTTATC GTTGGTGATA AAATCTACAA 2211AACTAACGGC GCGACAACAA ACCCTAACGC TACGCTC 2248 652 amino acids amino acidsingle linear protein internal Signal Sequence 1...17 16 Met Lys Lys ThrIle Leu Leu Ser Leu Ser Ala Ser Ser Leu Leu His -15 -10 -5 Ala Glu AspAsn Gly Phe Phe Val Ser Ala Gly Tyr Gln Ile Gly Glu 1 5 10 15 Ala ValGln Met Val Lys Asn Thr Gly Glu Leu Lys Asn Leu Asn Glu 20 25 30 Lys TyrGlu Gln Leu Ser Gln Tyr Leu Asn Gln Val Ala Ser Leu Lys 35 40 45 Gln SerIle Gln Asn Ala Asn Asn Ile Glu Leu Val Asn Ser Ser Leu 50 55 60 Asn TyrLeu Lys Ser Phe Thr Asn Asn Asn Tyr Asn Ser Thr Thr Gln 65 70 75 Ser ProIle Phe Asn Ala Val Gln Ala Val Ile Thr Ser Val Leu Gly 80 85 90 95 PheTrp Ser Leu Tyr Ala Gly Asn Tyr Phe Thr Phe Phe Val Gly Lys 100 105 110Lys Val Gly Asp Ser Gly Gln Pro Ala Ser Val Gln Gly Asn Pro Pro 115 120125 Phe Lys Thr Ile Ile Glu Asn Cys Ser Gly Ile Glu Asn Cys Ala Met 130135 140 Asp Gln Thr Thr Tyr Asp Lys Met Lys Lys Leu Ala Glu Asp Leu Gln145 150 155 Ala Ala Gln Thr Asn Ser Ala Thr Lys Gly Asn Asn Leu Cys AlaLeu 160 165 170 175 Ser Gly Cys Ala Ala Thr Asp Ser Thr Ser Asn Pro ProAsn Ser Thr 180 185 190 Val Ser Asn Ala Leu Asn Leu Ala Gln Gln Leu MetAsp Leu Ile Ala 195 200 205 Asn Thr Lys Thr Ala Met Met Trp Lys Asn IleVal Ile Ser Gly Val 210 215 220 Ser Asn Thr Ser Gly Ala Ile Thr Ser ThrAsn Tyr Pro Thr Gln Tyr 225 230 235 Ala Val Phe Asn Asn Ile Lys Ala MetIle Pro Ile Leu Gln Gln Ala 240 245 250 255 Val Thr Leu Ser Gln Ser AsnHis Thr Leu Ser Ala Ser Leu Gln Ala 260 265 270 Gln Ala Thr Gly Ser GlnThr Asn Pro Lys Phe Ala Lys Asp Ile Tyr 275 280 285 Thr Phe Ala Gln AsnGln Lys Gln Val Ile Ser Tyr Ala Gln Asp Ile 290 295 300 Phe Asn Leu PheAsn Ser Ile Pro Ala Glu Gln Tyr Lys Tyr Leu Glu 305 310 315 Lys Ala TyrLeu Lys Ile Pro Asn Ala Gly Ser Thr Pro Thr Asn Pro 320 325 330 335 TyrArg Gln Val Val Asn Leu Asn Gln Glu Val Gln Thr Ile Lys Asn 340 345 350Asn Val Ser Tyr Tyr Gly Asn Arg Val Asp Ala Ala Leu Ser Val Ala 355 360365 Arg Asp Val Tyr Asn Leu Lys Ser Asn Gln Ala Glu Ile Val Thr Ala 370375 380 Tyr Asn Asp Ala Lys Thr Leu Ser Glu Glu Ile Ser Lys Leu Pro His385 390 395 Asn Gln Val Asn Thr Lys Asp Ile Val Thr Leu Pro Tyr Asp LysAsn 400 405 410 415 Ala Pro Ala Ala Gly Gln Ser Asn Tyr Gln Ile Asn ProGlu Gln Gln 420 425 430 Ser Asn Leu Asn Gln Ala Leu Ala Ala Met Ser AsnAsn Pro Phe Lys 435 440 445 Lys Val Gly Met Ile Ser Ser Gln Asn Asn AsnGly Ala Leu Asn Gly 450 455 460 Leu Gly Val Gln Val Gly Tyr Lys Gln PhePhe Gly Glu Ser Lys Arg 465 470 475 Trp Gly Leu Arg Tyr Tyr Gly Phe PheAsp Tyr Asn His Gly Tyr Ile 480 485 490 495 Lys Ser Ser Phe Phe Asn SerSer Ser Asp Ile Trp Thr Tyr Gly Gly 500 505 510 Gly Ser Asp Leu Leu ValAsn Ile Ile Asn Asp Ser Ile Thr Arg Lys 515 520 525 Asn Asn Lys Leu SerVal Gly Leu Phe Gly Gly Ile Gln Leu Ala Gly 530 535 540 Thr Thr Trp LeuAsn Ser Gln Tyr Val Asn Leu Thr Ala Phe Asn Asn 545 550 555 Pro Tyr SerAla Lys Val Asn Ala Thr Asn Phe Gln Phe Leu Phe Asn 560 565 570 575 LeuGly Leu Arg Thr Asn Leu Ala Thr Ala Arg Lys Lys Asp Ser Glu 580 585 590His Ser Ala Gln His Gly Ile Glu Leu Gly Ile Lys Ile Pro Thr Ile 595 600605 Thr Thr Asn Tyr Tyr Ser Phe Leu Gly Thr Gln Leu Gln Tyr Arg Arg 610615 620 Leu Tyr Ser Val Tyr Leu Asn Tyr Val Phe Ala Tyr 625 630 635 2161base pairs nucleic acid single linear Genomic DNA Coding Sequence122...2056 (A) NAME/KEY Signal Sequence (B) LOCATION 122...179 (D) OTHERINFORMATION 17 CAAAAATCTT TTTTTTTTTT TTTTGAAATC CAATAAATTT ATGGTAAAGTTAAACATATT 60 GTAAATAAAT TTTAATTTCT ATTCATGTTT ACAATAAAAA AATTACTTTAAGGAACATTT 120 T ATG AAA AAG ACA ATT CTA CTC TCT CTC TCT CTC TCG CTT TCATCG CTC 169 Met Lys Lys Thr Ile Leu Leu Ser Leu Ser Leu Ser Leu Ser SerLeu -15 -10 -5 TTG CAC GCT GAA GAC AAC GGC TTT TTT GTG AGC GCC GGC TATCAA ATC 217 Leu His Ala Glu Asp Asn Gly Phe Phe Val Ser Ala Gly Tyr GlnIle 1 5 10 GGC GAA CGG GTG CAA ATG GTC AAA AAC ACC GGC GAA TTG AAA AACTTG 265 Gly Glu Arg Val Gln Met Val Lys Asn Thr Gly Glu Leu Lys Asn Leu15 20 25 AAC GAA AAA TAC GAG CAA TTA AGC CAA TCT TTA GCC CAA CTG GCT TCG313 Asn Glu Lys Tyr Glu Gln Leu Ser Gln Ser Leu Ala Gln Leu Ala Ser 3035 40 45 TTA AAA AAA AGC ATT CAA ACG GCG AAC AAC ATT CAG GCT GTC AAC AAT361 Leu Lys Lys Ser Ile Gln Thr Ala Asn Asn Ile Gln Ala Val Asn Asn 5055 60 GCT TTA AGC GAT TTA AAA AGC TTT GCG AGT AAC AAC CAC ACA AAC AAA409 Ala Leu Ser Asp Leu Lys Ser Phe Ala Ser Asn Asn His Thr Asn Lys 6570 75 GAA ACA TCG CCC ATC TAC AAC ACC GCG CAA GCT GTT ATC ACT TCA GTA457 Glu Thr Ser Pro Ile Tyr Asn Thr Ala Gln Ala Val Ile Thr Ser Val 8085 90 TTG GCT TTT TGG AGT CTT TAT GCA GGG AAC GCT ACC AGT TTT CAT GTG505 Leu Ala Phe Trp Ser Leu Tyr Ala Gly Asn Ala Thr Ser Phe His Val 95100 105 ACC GGT TTG AAT GAT GGA TCT AAT GCT CCT CTT GGA AGA ATC CAT CAA553 Thr Gly Leu Asn Asp Gly Ser Asn Ala Pro Leu Gly Arg Ile His Gln 110115 120 125 GAT GGG AAC TGC ACA GGA TTA CAA CAA TGT TTT ATG AAT AAA GAAACT 601 Asp Gly Asn Cys Thr Gly Leu Gln Gln Cys Phe Met Asn Lys Glu Thr130 135 140 TAT GAT AAA ATG AAA GCG CTT GCC GAA AAT CTC CAA AAA GCT CAAGGC 649 Tyr Asp Lys Met Lys Ala Leu Ala Glu Asn Leu Gln Lys Ala Gln Gly145 150 155 AAT CTC TGT GCC TTA TCA GAA TGC CCT AGC GAT CAA TTA AAT GGAAAC 697 Asn Leu Cys Ala Leu Ser Glu Cys Pro Ser Asp Gln Leu Asn Gly Asn160 165 170 AAT GGA AAC AAA ACT TCC ATG ACT AAA GCT CTT GAA ACC GCG CAACAG 745 Asn Gly Asn Lys Thr Ser Met Thr Lys Ala Leu Glu Thr Ala Gln Gln175 180 185 CTT ATG GAT TTA ATC GCA AAC ACT AAA ACG GCT ATG ATG TGG AAAAAT 793 Leu Met Asp Leu Ile Ala Asn Thr Lys Thr Ala Met Met Trp Lys Asn190 195 200 205 ATC GTC ATC GCA GGT GTT ACA AAC AGA CCC GGT GGT GCT GGCGCT ATC 841 Ile Val Ile Ala Gly Val Thr Asn Arg Pro Gly Gly Ala Gly AlaIle 210 215 220 ACA TCC ACT GGT CCT GTA ACC GAC TAT GCG GTG TTT AAC AACATT AAG 889 Thr Ser Thr Gly Pro Val Thr Asp Tyr Ala Val Phe Asn Asn IleLys 225 230 235 GCG ATG ATA CCC ATT TTG CAA CAA GCG GTT ACG CTT TCT CAAAGC AAC 937 Ala Met Ile Pro Ile Leu Gln Gln Ala Val Thr Leu Ser Gln SerAsn 240 245 250 CAC ACC CTA TCT GCT AGC TTG CAA GCT CAA GCC ACA GGA TCTCAA ACA 985 His Thr Leu Ser Ala Ser Leu Gln Ala Gln Ala Thr Gly Ser GlnThr 255 260 265 AAC CCT AAA TTC GCT AAA GAC ATC TAC ACT TTC GCT CAA AACCAA AAG 1033 Asn Pro Lys Phe Ala Lys Asp Ile Tyr Thr Phe Ala Gln Asn GlnLys 270 275 280 285 CAA GTC ATC TCT TAC GCT CAA GAC ATT TTC AAC CTC TTTAAT TCT ATC 1081 Gln Val Ile Ser Tyr Ala Gln Asp Ile Phe Asn Leu Phe AsnSer Ile 290 295 300 CCT GCA GAG CAG TAT AAG TAT CTA GAG AAA GCT TAC TTGAAA ATA CCC 1129 Pro Ala Glu Gln Tyr Lys Tyr Leu Glu Lys Ala Tyr Leu LysIle Pro 305 310 315 AAT GCG GGT TCA ACG CCT ACT AAC CCT TAC AGA CAA GTGGTG AAT TTA 1177 Asn Ala Gly Ser Thr Pro Thr Asn Pro Tyr Arg Gln Val ValAsn Leu 320 325 330 AAC CAA GAA GTT CAG ACG ATT AAA AAC AAT GTG AGT TATTAT GGT AAC 1225 Asn Gln Glu Val Gln Thr Ile Lys Asn Asn Val Ser Tyr TyrGly Asn 335 340 345 CGG GTG GAT GCG GCT TTA AGC GTG GCT AGA GAT GTT TATAAC CTA AAA 1273 Arg Val Asp Ala Ala Leu Ser Val Ala Arg Asp Val Tyr AsnLeu Lys 350 355 360 365 TCC AAT CAA GCA GAA ATC GTA ACC GCC TAT AAC GACGCT AAG ACT TTG 1321 Ser Asn Gln Ala Glu Ile Val Thr Ala Tyr Asn Asp AlaLys Thr Leu 370 375 380 AGC GAA GAG ATT TCT AAA CTC CCG CAC AAT CAA GTCAAT ACA AAA GAC 1369 Ser Glu Glu Ile Ser Lys Leu Pro His Asn Gln Val AsnThr Lys Asp 385 390 395 ATT GTT ACA CTA CCT TAC GAT AAA AAC GCC CCA GCAGCA GGC CAA TCC 1417 Ile Val Thr Leu Pro Tyr Asp Lys Asn Ala Pro Ala AlaGly Gln Ser 400 405 410 AAC TAC CAA ATC AAC CCA GAG CAG CAA TCC AAT CTTAAC CAA GCT TTA 1465 Asn Tyr Gln Ile Asn Pro Glu Gln Gln Ser Asn Leu AsnGln Ala Leu 415 420 425 GCA GCG ATG AGC AAT AAC CCC TTT AAA AAA GTG GGCATG ATC AGC TCT 1513 Ala Ala Met Ser Asn Asn Pro Phe Lys Lys Val Gly MetIle Ser Ser 430 435 440 445 CAA AAC AAT AAC GGC GCT TTG AAC GGG CTT GGCGTG CAA GTG GGT TAT 1561 Gln Asn Asn Asn Gly Ala Leu Asn Gly Leu Gly ValGln Val Gly Tyr 450 455 460 AAG CAA TTC TTT GGC GAA AGC AAA AGA TGG GGGTTA AGG TAT TAC GGA 1609 Lys Gln Phe Phe Gly Glu Ser Lys Arg Trp Gly LeuArg Tyr Tyr Gly 465 470 475 TTC TTT GAT TAC AAC CAC GGC TAC ATC AAA TCCAGC TTC TTT AAC TCT 1657 Phe Phe Asp Tyr Asn His Gly Tyr Ile Lys Ser SerPhe Phe Asn Ser 480 485 490 TCT TCT GAT ATA TGG ACT TAT GGC GGT GGG AGCGAT TTG TTA GTG AAT 1705 Ser Ser Asp Ile Trp Thr Tyr Gly Gly Gly Ser AspLeu Leu Val Asn 495 500 505 ATT ATC AAC GAT AGC ATC ACA AGA AAG AAC AACAAG CTC TCC GTG GGT 1753 Ile Ile Asn Asp Ser Ile Thr Arg Lys Asn Asn LysLeu Ser Val Gly 510 515 520 525 CTT TTT GGA GGC ATC CAA CTA GCA GGG ACTACA TGG CTT AAT TCT CAA 1801 Leu Phe Gly Gly Ile Gln Leu Ala Gly Thr ThrTrp Leu Asn Ser Gln 530 535 540 TAC GTG AAT TTA ACC GCG TTC AAT AAC CCTTAC AGC GCG AAA GTC AAT 1849 Tyr Val Asn Leu Thr Ala Phe Asn Asn Pro TyrSer Ala Lys Val Asn 545 550 555 GCT ACC AAT TTC CAA TTC TTG TTC AAT CTCGGC TTG AGG ACG AAT CTC 1897 Ala Thr Asn Phe Gln Phe Leu Phe Asn Leu GlyLeu Arg Thr Asn Leu 560 565 570 GCT ACA GCT AGG AAA AAA GAC AGC GAA CATTCC GCG CAA CAT GGC ATT 1945 Ala Thr Ala Arg Lys Lys Asp Ser Glu His SerAla Gln His Gly Ile 575 580 585 GAA TTG GGT ATT AAA ATC CCC ACC ATT ACCACG AAT TAC TAT TCT TTT 1993 Glu Leu Gly Ile Lys Ile Pro Thr Ile Thr ThrAsn Tyr Tyr Ser Phe 590 595 600 605 CTA GGC ACT CAA TTG CAA TAC AGA AGGCTC TAT AGC GTG TAT CTC AAT 2041 Leu Gly Thr Gln Leu Gln Tyr Arg Arg LeuTyr Ser Val Tyr Leu Asn 610 615 620 TAT GTG TTC GCT TAT TAAAAAATCTTCTTTTTAAA ATAGGGGGAG CTTCATCAAA T 2097 Tyr Val Phe Ala Tyr 625CTATTTTGAT AGTTATCAAT ATTTGATGAA AATAAAGTCA AAAACAAAAT AAACCAAATC 2157ACCC 2161 645 amino acids amino acid single linear protein internalSignal Sequence 1...19 18 Met Lys Lys Thr Ile Leu Leu Ser Leu Ser LeuSer Leu Ser Ser Leu -15 -10 -5 Leu His Ala Glu Asp Asn Gly Phe Phe ValSer Ala Gly Tyr Gln Ile 1 5 10 Gly Glu Arg Val Gln Met Val Lys Asn ThrGly Glu Leu Lys Asn Leu 15 20 25 Asn Glu Lys Tyr Glu Gln Leu Ser Gln SerLeu Ala Gln Leu Ala Ser 30 35 40 45 Leu Lys Lys Ser Ile Gln Thr Ala AsnAsn Ile Gln Ala Val Asn Asn 50 55 60 Ala Leu Ser Asp Leu Lys Ser Phe AlaSer Asn Asn His Thr Asn Lys 65 70 75 Glu Thr Ser Pro Ile Tyr Asn Thr AlaGln Ala Val Ile Thr Ser Val 80 85 90 Leu Ala Phe Trp Ser Leu Tyr Ala GlyAsn Ala Thr Ser Phe His Val 95 100 105 Thr Gly Leu Asn Asp Gly Ser AsnAla Pro Leu Gly Arg Ile His Gln 110 115 120 125 Asp Gly Asn Cys Thr GlyLeu Gln Gln Cys Phe Met Asn Lys Glu Thr 130 135 140 Tyr Asp Lys Met LysAla Leu Ala Glu Asn Leu Gln Lys Ala Gln Gly 145 150 155 Asn Leu Cys AlaLeu Ser Glu Cys Pro Ser Asp Gln Leu Asn Gly Asn 160 165 170 Asn Gly AsnLys Thr Ser Met Thr Lys Ala Leu Glu Thr Ala Gln Gln 175 180 185 Leu MetAsp Leu Ile Ala Asn Thr Lys Thr Ala Met Met Trp Lys Asn 190 195 200 205Ile Val Ile Ala Gly Val Thr Asn Arg Pro Gly Gly Ala Gly Ala Ile 210 215220 Thr Ser Thr Gly Pro Val Thr Asp Tyr Ala Val Phe Asn Asn Ile Lys 225230 235 Ala Met Ile Pro Ile Leu Gln Gln Ala Val Thr Leu Ser Gln Ser Asn240 245 250 His Thr Leu Ser Ala Ser Leu Gln Ala Gln Ala Thr Gly Ser GlnThr 255 260 265 Asn Pro Lys Phe Ala Lys Asp Ile Tyr Thr Phe Ala Gln AsnGln Lys 270 275 280 285 Gln Val Ile Ser Tyr Ala Gln Asp Ile Phe Asn LeuPhe Asn Ser Ile 290 295 300 Pro Ala Glu Gln Tyr Lys Tyr Leu Glu Lys AlaTyr Leu Lys Ile Pro 305 310 315 Asn Ala Gly Ser Thr Pro Thr Asn Pro TyrArg Gln Val Val Asn Leu 320 325 330 Asn Gln Glu Val Gln Thr Ile Lys AsnAsn Val Ser Tyr Tyr Gly Asn 335 340 345 Arg Val Asp Ala Ala Leu Ser ValAla Arg Asp Val Tyr Asn Leu Lys 350 355 360 365 Ser Asn Gln Ala Glu IleVal Thr Ala Tyr Asn Asp Ala Lys Thr Leu 370 375 380 Ser Glu Glu Ile SerLys Leu Pro His Asn Gln Val Asn Thr Lys Asp 385 390 395 Ile Val Thr LeuPro Tyr Asp Lys Asn Ala Pro Ala Ala Gly Gln Ser 400 405 410 Asn Tyr GlnIle Asn Pro Glu Gln Gln Ser Asn Leu Asn Gln Ala Leu 415 420 425 Ala AlaMet Ser Asn Asn Pro Phe Lys Lys Val Gly Met Ile Ser Ser 430 435 440 445Gln Asn Asn Asn Gly Ala Leu Asn Gly Leu Gly Val Gln Val Gly Tyr 450 455460 Lys Gln Phe Phe Gly Glu Ser Lys Arg Trp Gly Leu Arg Tyr Tyr Gly 465470 475 Phe Phe Asp Tyr Asn His Gly Tyr Ile Lys Ser Ser Phe Phe Asn Ser480 485 490 Ser Ser Asp Ile Trp Thr Tyr Gly Gly Gly Ser Asp Leu Leu ValAsn 495 500 505 Ile Ile Asn Asp Ser Ile Thr Arg Lys Asn Asn Lys Leu SerVal Gly 510 515 520 525 Leu Phe Gly Gly Ile Gln Leu Ala Gly Thr Thr TrpLeu Asn Ser Gln 530 535 540 Tyr Val Asn Leu Thr Ala Phe Asn Asn Pro TyrSer Ala Lys Val Asn 545 550 555 Ala Thr Asn Phe Gln Phe Leu Phe Asn LeuGly Leu Arg Thr Asn Leu 560 565 570 Ala Thr Ala Arg Lys Lys Asp Ser GluHis Ser Ala Gln His Gly Ile 575 580 585 Glu Leu Gly Ile Lys Ile Pro ThrIle Thr Thr Asn Tyr Tyr Ser Phe 590 595 600 605 Leu Gly Thr Gln Leu GlnTyr Arg Arg Leu Tyr Ser Val Tyr Leu Asn 610 615 620 Tyr Val Phe Ala Tyr625 1799 base pairs nucleic acid single linear Genomic DNA CodingSequence 185...1633 (A) NAME/KEY Signal Sequence (B) LOCATION 185...233(D) OTHER INFORMATION 19 TACTCAAAAC ATTTTTCACT ATCAAAAACC TTTTTTTTAAATCCAAAAAA AAAGCAAAAT 60 TTCTTAATTT TTGCTCAATT TTATTAAAAA TTCAATAAATTTATGGCACA ATTTAAACTT 120 ATTGTAAATA AAGTTTCAAT TTGATACGAT TTTACAAACAAAACATTACT TTAAGGAACA 180 TTTT ATG AAA AAA ACG ATT TTA CTT TCT CTT ATGGTT TCA TCG CTC CTC 229 Met Lys Lys Thr Ile Leu Leu Ser Leu Met Val SerSer Leu Leu -15 -10 -5 GCT GAA AAT GAC GGC GTT TTT ATG AGC GTG GGC TATCAA ATC GGC GAA 277 Ala Glu Asn Asp Gly Val Phe Met Ser Val Gly Tyr GlnIle Gly Glu 1 5 10 15 GCG GTT CAA CAA GTG AAA AAC ACC GGC GAA ATC CAAAAA GTC TCC AAC 325 Ala Val Gln Gln Val Lys Asn Thr Gly Glu Ile Gln LysVal Ser Asn 20 25 30 GCT TAC GAA AAT TTG AAC AAT CTT TTA ACC CGC TAT AACGAA CTC AAA 373 Ala Tyr Glu Asn Leu Asn Asn Leu Leu Thr Arg Tyr Asn GluLeu Lys 35 40 45 CAA ACG GCC TCT AAC ACC AAT TCA AGT ACC GCT CAA GCG ATTGAT AAT 421 Gln Thr Ala Ser Asn Thr Asn Ser Ser Thr Ala Gln Ala Ile AspAsn 50 55 60 CTA AAA GAG AGC GCT AGC CGA TTG AAA ACG ACC CCC AAT AGC GCTAAT 469 Leu Lys Glu Ser Ala Ser Arg Leu Lys Thr Thr Pro Asn Ser Ala Asn65 70 75 CAA GCC GTG TCT TCA GCG CTC AGC TCT GCG GTA GCC ATG TGG CAA GTA517 Gln Ala Val Ser Ser Ala Leu Ser Ser Ala Val Ala Met Trp Gln Val 8085 90 95 ATA GTC TCT AAT TTA GCC AAT AAC TCG CTA CCC ACT AGT GAA TAC AAC565 Ile Val Ser Asn Leu Ala Asn Asn Ser Leu Pro Thr Ser Glu Tyr Asn 100105 110 AAA ATC AAT GCG ATT TCT CAA TCG CTC CAA AAC ACC CTA GAA AAT AAA613 Lys Ile Asn Ala Ile Ser Gln Ser Leu Gln Asn Thr Leu Glu Asn Lys 115120 125 AAC AAT GAT CTT AAA ATT GAA AAT GAC TAC GAC CAT CTT TTA ACT CAA661 Asn Asn Asp Leu Lys Ile Glu Asn Asp Tyr Asp His Leu Leu Thr Gln 130135 140 GCT AGC ACC ATT ATT AAT ACC CTT CAA AGC CAA TGC CCA GGC ATA GAC709 Ala Ser Thr Ile Ile Asn Thr Leu Gln Ser Gln Cys Pro Gly Ile Asp 145150 155 GGA GGC AAT GGC AAA CCA TGG GGC ATT AAT GCA AGC GGG AAC GCA TGC757 Gly Gly Asn Gly Lys Pro Trp Gly Ile Asn Ala Ser Gly Asn Ala Cys 160165 170 175 AAT ATT TTT GGC AAC ACC TTT AAC GCC ATC ACT AGC ATG ATA GATAGC 805 Asn Ile Phe Gly Asn Thr Phe Asn Ala Ile Thr Ser Met Ile Asp Ser180 185 190 GCT AAA AAA GCC GCC GCA GAT GCC CGA AGA ACT GCC CCA GAA AGTCCA 853 Ala Lys Lys Ala Ala Ala Asp Ala Arg Arg Thr Ala Pro Glu Ser Pro195 200 205 AAC CAA CCA AGT GCG TTT AAC AAC GCT GAT TTC AAT AAA AAC CTTAAT 901 Asn Gln Pro Ser Ala Phe Asn Asn Ala Asp Phe Asn Lys Asn Leu Asn210 215 220 CAA GTC TCA AGC GTT ATT AAT GAC ACG ATC TCT TAC CTC AAA GGGGAC 949 Gln Val Ser Ser Val Ile Asn Asp Thr Ile Ser Tyr Leu Lys Gly Asp225 230 235 AAT TTA GCA ACC ATC TAC AAC ACC CTT CAA AAA ACG CCC GAT TCTAAA 997 Asn Leu Ala Thr Ile Tyr Asn Thr Leu Gln Lys Thr Pro Asp Ser Lys240 245 250 255 GGG TTT CAA AGT TTG GTG AGC CGA TCT AGC TAT AGT TAT TCCCTC AAC 1045 Gly Phe Gln Ser Leu Val Ser Arg Ser Ser Tyr Ser Tyr Ser LeuAsn 260 265 270 GAA ACC CAA TAT TCT GAA TTC CAA ACT ACC ACC AAA GAG TTTGGC CAT 1093 Glu Thr Gln Tyr Ser Glu Phe Gln Thr Thr Thr Lys Glu Phe GlyHis 275 280 285 AAC CCT TTT AGA AGC GTG GGT TTA ATC AAC TCT CAA AGC AATAAC GGA 1141 Asn Pro Phe Arg Ser Val Gly Leu Ile Asn Ser Gln Ser Asn AsnGly 290 295 300 GCG ATG AAT GGC GTG GGC GTG CAA TTA GGC TAT AAG CAA TTCTTT GGG 1189 Ala Met Asn Gly Val Gly Val Gln Leu Gly Tyr Lys Gln Phe PheGly 305 310 315 AAA AAT AAA TTT TTT GGG ATC CGT TAT TAT GCC TTT TTT GATTAC AAC 1237 Lys Asn Lys Phe Phe Gly Ile Arg Tyr Tyr Ala Phe Phe Asp TyrAsn 320 325 330 335 CAT GCC TAT ATC AAA TCC AAC TTT TTC AAC TCC GCT TCCAAT GTT TTC 1285 His Ala Tyr Ile Lys Ser Asn Phe Phe Asn Ser Ala Ser AsnVal Phe 340 345 350 ACT TAT GGC GCA GGC AGT GAT CTT TTA TTG AAT TTC ATCAAT GGC GGA 1333 Thr Tyr Gly Ala Gly Ser Asp Leu Leu Leu Asn Phe Ile AsnGly Gly 355 360 365 TCC GAT AAA AAC CGC AAA GTC TCT TTT GGC ATT TTT GGAGGC ATC GCT 1381 Ser Asp Lys Asn Arg Lys Val Ser Phe Gly Ile Phe Gly GlyIle Ala 370 375 380 CTA GCA GGC ACG ACA TGG CTT AAT TCC CAA TTT ATG AATTTA AAA ACC 1429 Leu Ala Gly Thr Thr Trp Leu Asn Ser Gln Phe Met Asn LeuLys Thr 385 390 395 ACC AAT AGC GCC TAC AGC GCT AAG ATC AAC AAC ACC AATTTC CAA TTC 1477 Thr Asn Ser Ala Tyr Ser Ala Lys Ile Asn Asn Thr Asn PheGln Phe 400 405 410 415 TTA TTC AAT ACT GGT TTA AGG CTT CAA GGG ATT CACCAT GGC GTT GAA 1525 Leu Phe Asn Thr Gly Leu Arg Leu Gln Gly Ile His HisGly Val Glu 420 425 430 TTA GGC GTG AAA ATC CCC ACC ATC AAC ACG AAT TACTAT TCT TTC ATG 1573 Leu Gly Val Lys Ile Pro Thr Ile Asn Thr Asn Tyr TyrSer Phe Met 435 440 445 GGC GCT AAA TTA GCA TAC CGA AGA CTT TAT AGC GTGTAT TTC AAT TAT 1621 Gly Ala Lys Leu Ala Tyr Arg Arg Leu Tyr Ser Val TyrPhe Asn Tyr 450 455 460 GTT TTG GCC TAT TGATATTGAA TCGGTTCTCA TTACTAATGAGGACAAAGCC AAACT 1678 Val Leu Ala Tyr 465 TTTTGGCTCT CAATGAATAACGGCATCATT TTACTTGACT TTTTACAAAA AACACACTAA 1738 AATTTCTTTT TCTTTTTTGAGCGAAATTCC AGATTAGCTC AGCGGTAGAG TAGGCGGCTG 1798 T 1799 483 amino acidsamino acid single linear protein internal Signal Sequence 1...16 20 MetLys Lys Thr Ile Leu Leu Ser Leu Met Val Ser Ser Leu Leu Ala -15 -10 -5Glu Asn Asp Gly Val Phe Met Ser Val Gly Tyr Gln Ile Gly Glu Ala 1 5 1015 Val Gln Gln Val Lys Asn Thr Gly Glu Ile Gln Lys Val Ser Asn Ala 20 2530 Tyr Glu Asn Leu Asn Asn Leu Leu Thr Arg Tyr Asn Glu Leu Lys Gln 35 4045 Thr Ala Ser Asn Thr Asn Ser Ser Thr Ala Gln Ala Ile Asp Asn Leu 50 5560 Lys Glu Ser Ala Ser Arg Leu Lys Thr Thr Pro Asn Ser Ala Asn Gln 65 7075 80 Ala Val Ser Ser Ala Leu Ser Ser Ala Val Ala Met Trp Gln Val Ile 8590 95 Val Ser Asn Leu Ala Asn Asn Ser Leu Pro Thr Ser Glu Tyr Asn Lys100 105 110 Ile Asn Ala Ile Ser Gln Ser Leu Gln Asn Thr Leu Glu Asn LysAsn 115 120 125 Asn Asp Leu Lys Ile Glu Asn Asp Tyr Asp His Leu Leu ThrGln Ala 130 135 140 Ser Thr Ile Ile Asn Thr Leu Gln Ser Gln Cys Pro GlyIle Asp Gly 145 150 155 160 Gly Asn Gly Lys Pro Trp Gly Ile Asn Ala SerGly Asn Ala Cys Asn 165 170 175 Ile Phe Gly Asn Thr Phe Asn Ala Ile ThrSer Met Ile Asp Ser Ala 180 185 190 Lys Lys Ala Ala Ala Asp Ala Arg ArgThr Ala Pro Glu Ser Pro Asn 195 200 205 Gln Pro Ser Ala Phe Asn Asn AlaAsp Phe Asn Lys Asn Leu Asn Gln 210 215 220 Val Ser Ser Val Ile Asn AspThr Ile Ser Tyr Leu Lys Gly Asp Asn 225 230 235 240 Leu Ala Thr Ile TyrAsn Thr Leu Gln Lys Thr Pro Asp Ser Lys Gly 245 250 255 Phe Gln Ser LeuVal Ser Arg Ser Ser Tyr Ser Tyr Ser Leu Asn Glu 260 265 270 Thr Gln TyrSer Glu Phe Gln Thr Thr Thr Lys Glu Phe Gly His Asn 275 280 285 Pro PheArg Ser Val Gly Leu Ile Asn Ser Gln Ser Asn Asn Gly Ala 290 295 300 MetAsn Gly Val Gly Val Gln Leu Gly Tyr Lys Gln Phe Phe Gly Lys 305 310 315320 Asn Lys Phe Phe Gly Ile Arg Tyr Tyr Ala Phe Phe Asp Tyr Asn His 325330 335 Ala Tyr Ile Lys Ser Asn Phe Phe Asn Ser Ala Ser Asn Val Phe Thr340 345 350 Tyr Gly Ala Gly Ser Asp Leu Leu Leu Asn Phe Ile Asn Gly GlySer 355 360 365 Asp Lys Asn Arg Lys Val Ser Phe Gly Ile Phe Gly Gly IleAla Leu 370 375 380 Ala Gly Thr Thr Trp Leu Asn Ser Gln Phe Met Asn LeuLys Thr Thr 385 390 395 400 Asn Ser Ala Tyr Ser Ala Lys Ile Asn Asn ThrAsn Phe Gln Phe Leu 405 410 415 Phe Asn Thr Gly Leu Arg Leu Gln Gly IleHis His Gly Val Glu Leu 420 425 430 Gly Val Lys Ile Pro Thr Ile Asn ThrAsn Tyr Tyr Ser Phe Met Gly 435 440 445 Ala Lys Leu Ala Tyr Arg Arg LeuTyr Ser Val Tyr Phe Asn Tyr Val 450 455 460 Leu Ala Tyr 465 2338 basepairs nucleic acid single linear Genomic DNA Coding Sequence 146...2218(A) NAME/KEY Signal Sequence (B) LOCATION 146...200 (D) OTHERINFORMATION 21 ACTTAAAATT GTTTTTTTTT TTTTTCAAAA TATAAATTTT AAGCCAAAAATAAGCATTTT 60 ATGGTAAAAT GGCGAACTTT CATAAACATG ACTATTATGG GAATGTCATGGGAATGTGAA 120 GAAAAATCTA TTAAAA GGA GAA AAC ATG AAA AAA TCC CTC TTA CTCTCT CTT 172 Met Lys Lys Ser Leu Leu Leu Ser Leu -18 -15 -10 TCT CTC ATCGCT TCC TTA TCA AGA GCT GAA GAT GAC GGA TTT TAT ACG 220 Ser Leu Ile AlaSer Leu Ser Arg Ala Glu Asp Asp Gly Phe Tyr Thr -5 1 5 AGT GTG GGC TATCAG ATC GGT GAA GCG GTC CAA CAA GTG AAA AAC ACA 268 Ser Val Gly Tyr GlnIle Gly Glu Ala Val Gln Gln Val Lys Asn Thr 10 15 20 GGA GCA TTG CAA AATCTT GCA GAC AGA TAC GAT AAC TTA AAC AAC CTT 316 Gly Ala Leu Gln Asn LeuAla Asp Arg Tyr Asp Asn Leu Asn Asn Leu 25 30 35 TTA AAC CAA TAC AAT TATTTA AAT TCC TTA GTC AAT TTA GCC AGC ACG 364 Leu Asn Gln Tyr Asn Tyr LeuAsn Ser Leu Val Asn Leu Ala Ser Thr 40 45 50 55 CCG AGC GCG ATC ACC GGTGCG ATT GAT AAT TTA AGC TCA AGC GCG ATT 412 Pro Ser Ala Ile Thr Gly AlaIle Asp Asn Leu Ser Ser Ser Ala Ile 60 65 70 AAC CTC ACT AGC GCC ACC ACCACT TCC CCC GCC TAT CAA GCT GTG GCT 460 Asn Leu Thr Ser Ala Thr Thr ThrSer Pro Ala Tyr Gln Ala Val Ala 75 80 85 TTA GCG CTC AAT GCC GCT GTG GGCATG TGG CAA GTC ATA GCC CTT TTT 508 Leu Ala Leu Asn Ala Ala Val Gly MetTrp Gln Val Ile Ala Leu Phe 90 95 100 ATT GGC TGT GGC CCT GGC CCT ACCAAT AAT CAA AGC TAT CAA TCG TTT 556 Ile Gly Cys Gly Pro Gly Pro Thr AsnAsn Gln Ser Tyr Gln Ser Phe 105 110 115 GGT AAC ACA CCA GCC CTT AAT GGGACC ACC ACC ACT TGC AAT CAA GCA 604 Gly Asn Thr Pro Ala Leu Asn Gly ThrThr Thr Thr Cys Asn Gln Ala 120 125 130 135 TAT GGG ACA GGC CCT AAT GGCATC CTA TCT ATT GAT GAA TAC CAA AAA 652 Tyr Gly Thr Gly Pro Asn Gly IleLeu Ser Ile Asp Glu Tyr Gln Lys 140 145 150 CTC AAC CAA GCT TAT CAG ATCATC CAA ACC GCT TTA AAC CAA AAT CAA 700 Leu Asn Gln Ala Tyr Gln Ile IleGln Thr Ala Leu Asn Gln Asn Gln 155 160 165 GGG GGT GGG ATG CCT GCC TTGAAT GAC ACC ACC AAA ACA GGG GTA GTC 748 Gly Gly Gly Met Pro Ala Leu AsnAsp Thr Thr Lys Thr Gly Val Val 170 175 180 AAC ATA CAA CAA ACC AAT TATAGG ACC ACC ACA CAA AAC AAT ATC ATA 796 Asn Ile Gln Gln Thr Asn Tyr ArgThr Thr Thr Gln Asn Asn Ile Ile 185 190 195 GAG CAT TAT TAT ACA GAG AATGGG AAA GAG ATC CCA GTC TCT TAT TCA 844 Glu His Tyr Tyr Thr Glu Asn GlyLys Glu Ile Pro Val Ser Tyr Ser 200 205 210 215 GGC GGA TCA TCA TTC TCGCCT ACA ATA CAA TTG ACA TAC CAT AAT AAC 892 Gly Gly Ser Ser Phe Ser ProThr Ile Gln Leu Thr Tyr His Asn Asn 220 225 230 GCT GAA AAC CTT TTG CAACAA GCC GCC ACT ATC ATG CAA GTC CTT ATT 940 Ala Glu Asn Leu Leu Gln GlnAla Ala Thr Ile Met Gln Val Leu Ile 235 240 245 ACT CAA AAG CCG CAT GTGCAA ACG AGC AAT GGC GGT AAA GCG TGG GGG 988 Thr Gln Lys Pro His Val GlnThr Ser Asn Gly Gly Lys Ala Trp Gly 250 255 260 TTG AGT TCT ACG CCT GGGAAT GTG ATG GAT ATT TTT GGT CCT TCT TTT 1036 Leu Ser Ser Thr Pro Gly AsnVal Met Asp Ile Phe Gly Pro Ser Phe 265 270 275 AAC GCT ATT AAT GAG ATGATT AAA AAC GCT CAA ACA GCC CTA GCA AAA 1084 Asn Ala Ile Asn Glu Met IleLys Asn Ala Gln Thr Ala Leu Ala Lys 280 285 290 295 ACC CAA CAG CTT AACGCT AAT GAA AAC GCC CAA ATC ACG CAA CCC AAC 1132 Thr Gln Gln Leu Asn AlaAsn Glu Asn Ala Gln Ile Thr Gln Pro Asn 300 305 310 AAT TTC AAC CCC TACACC TCT AAA GAC AAA GGG TTC GCT CAA GAA ATG 1180 Asn Phe Asn Pro Tyr ThrSer Lys Asp Lys Gly Phe Ala Gln Glu Met 315 320 325 CTC AAT AGA GCT GAAGCT CAA GCA GAG ATT TTA AAT TTA GCT AAG CAA 1228 Leu Asn Arg Ala Glu AlaGln Ala Glu Ile Leu Asn Leu Ala Lys Gln 330 335 340 GTA GCG AAC AAT TTCCAC AGC ATT CAA GGG CCT ATT CAA GGG GAT TTA 1276 Val Ala Asn Asn Phe HisSer Ile Gln Gly Pro Ile Gln Gly Asp Leu 345 350 355 GAA GAA TGT AAA GCAGGA TCG GCT GGC GTG ATC ACT AAT AAC ACT TGG 1324 Glu Glu Cys Lys Ala GlySer Ala Gly Val Ile Thr Asn Asn Thr Trp 360 365 370 375 GGT TCA GGT TGCGCG TTT GTG AAA GAA ACT TTA AAC TCT TTA GAG CAA 1372 Gly Ser Gly Cys AlaPhe Val Lys Glu Thr Leu Asn Ser Leu Glu Gln 380 385 390 CAC ACC GCT TATTAC GGC AAC CAG GTC AAT CAG GAT AGG GCT TTG GCT 1420 His Thr Ala Tyr TyrGly Asn Gln Val Asn Gln Asp Arg Ala Leu Ala 395 400 405 CAA ACC ATT TTGAAT TTT AAA GAA GCC CTT AAC ACC CTG AAT AAA GAC 1468 Gln Thr Ile Leu AsnPhe Lys Glu Ala Leu Asn Thr Leu Asn Lys Asp 410 415 420 TCA AAA GCG ATCAAT AGC GGT ATC TCC AAC TTG CCT AAC GCT AAA TCT 1516 Ser Lys Ala Ile AsnSer Gly Ile Ser Asn Leu Pro Asn Ala Lys Ser 425 430 435 CTT CAA AAC ATGACG CAT GCC ACT CAA AAC CCT AAT TCC CCA GAA GGT 1564 Leu Gln Asn Met ThrHis Ala Thr Gln Asn Pro Asn Ser Pro Glu Gly 440 445 450 455 CTG CTC ACTTAT TCT TTG GAT TCA AGC AAA TAC AAC CAG CTC CAA ACC 1612 Leu Leu Thr TyrSer Leu Asp Ser Ser Lys Tyr Asn Gln Leu Gln Thr 460 465 470 ATC GCG CAAGAA TTG GGC AAA AAC CCT TTC AGG CGC TTT GGC GTG ATT 1660 Ile Ala Gln GluLeu Gly Lys Asn Pro Phe Arg Arg Phe Gly Val Ile 475 480 485 GAC TTT CAAAAC AAC AAC GGC GCA ATG AAC GGG ATC GGC GTG CAA GTG 1708 Asp Phe Gln AsnAsn Asn Gly Ala Met Asn Gly Ile Gly Val Gln Val 490 495 500 GGT TAT AAACAA TTC TTT GGT AAA AAA AGG AAT TGG GGG TTA AGG TAT 1756 Gly Tyr Lys GlnPhe Phe Gly Lys Lys Arg Asn Trp Gly Leu Arg Tyr 505 510 515 TAT GGT TTCTTT GAT TAT AAC CAT GCT TAT ATC AAA TCT AAT TTT TTC 1804 Tyr Gly Phe PheAsp Tyr Asn His Ala Tyr Ile Lys Ser Asn Phe Phe 520 525 530 535 AAC TCCGCT TCT GAT GTG TGG ACT TAT GGG GTG GGT ATG GAC GCT CTC 1852 Asn Ser AlaSer Asp Val Trp Thr Tyr Gly Val Gly Met Asp Ala Leu 540 545 550 TAT AACTTC ATC AAC GAT AAA AAC ACC AAC TTT TTA GGC AAG AAC AAC 1900 Tyr Asn PheIle Asn Asp Lys Asn Thr Asn Phe Leu Gly Lys Asn Asn 555 560 565 AAG CTTTCA GTA GGG CTT TTT GGA GGC TTT GCG TTA GCC GGG ACT TCG 1948 Lys Leu SerVal Gly Leu Phe Gly Gly Phe Ala Leu Ala Gly Thr Ser 570 575 580 TGG CTTAAT TCC CAA CAA GTG AAT TTG ACC ATG ATG AAT GGC ATT TAT 1996 Trp Leu AsnSer Gln Gln Val Asn Leu Thr Met Met Asn Gly Ile Tyr 585 590 595 AAC GCTAAT GTC AGC ACT TCT AAC TTC CAA TTT TTG TTT GAT TTA GGC 2044 Asn Ala AsnVal Ser Thr Ser Asn Phe Gln Phe Leu Phe Asp Leu Gly 600 605 610 615 TTGAGA ATG AAC CTC GCT AGG CCT AAG AAA AAA GAC AGC GAT CAT GCC 2092 Leu ArgMet Asn Leu Ala Arg Pro Lys Lys Lys Asp Ser Asp His Ala 620 625 630 GCTCAG CAT GGC ATT GAA CTA GGT TTT AAG ATC CCC ACG ATC AAC ACC 2140 Ala GlnHis Gly Ile Glu Leu Gly Phe Lys Ile Pro Thr Ile Asn Thr 635 640 645 AACTAT TAT TCT TTC ATG GGC GCT AAA CTA GAA TAC AGA AGG ATG TAT 2188 Asn TyrTyr Ser Phe Met Gly Ala Lys Leu Glu Tyr Arg Arg Met Tyr 650 655 660 AGCCTT TTT CTC AAT TAT GTG TTT GCT TAC TAAAAACTCT CTTTAAAAAA GGG 2241 SerLeu Phe Leu Asn Tyr Val Phe Ala Tyr 665 670 GTTTGTTTAA AAACGCTTAAAAGCATTTTT AAAATTAAGC AGTAAAGAGC CTAGATAATC 2301 TCTTGCAACC GCTCTCAAGCGATAAAATTA AAGTGAT 2338 691 amino acids amino acid single linear proteininternal Signal Sequence 1...18 22 Met Lys Lys Ser Leu Leu Leu Ser LeuSer Leu Ile Ala Ser Leu Ser -18 -15 -10 -5 Arg Ala Glu Asp Asp Gly PheTyr Thr Ser Val Gly Tyr Gln Ile Gly 1 5 10 Glu Ala Val Gln Gln Val LysAsn Thr Gly Ala Leu Gln Asn Leu Ala 15 20 25 30 Asp Arg Tyr Asp Asn LeuAsn Asn Leu Leu Asn Gln Tyr Asn Tyr Leu 35 40 45 Asn Ser Leu Val Asn LeuAla Ser Thr Pro Ser Ala Ile Thr Gly Ala 50 55 60 Ile Asp Asn Leu Ser SerSer Ala Ile Asn Leu Thr Ser Ala Thr Thr 65 70 75 Thr Ser Pro Ala Tyr GlnAla Val Ala Leu Ala Leu Asn Ala Ala Val 80 85 90 Gly Met Trp Gln Val IleAla Leu Phe Ile Gly Cys Gly Pro Gly Pro 95 100 105 110 Thr Asn Asn GlnSer Tyr Gln Ser Phe Gly Asn Thr Pro Ala Leu Asn 115 120 125 Gly Thr ThrThr Thr Cys Asn Gln Ala Tyr Gly Thr Gly Pro Asn Gly 130 135 140 Ile LeuSer Ile Asp Glu Tyr Gln Lys Leu Asn Gln Ala Tyr Gln Ile 145 150 155 IleGln Thr Ala Leu Asn Gln Asn Gln Gly Gly Gly Met Pro Ala Leu 160 165 170Asn Asp Thr Thr Lys Thr Gly Val Val Asn Ile Gln Gln Thr Asn Tyr 175 180185 190 Arg Thr Thr Thr Gln Asn Asn Ile Ile Glu His Tyr Tyr Thr Glu Asn195 200 205 Gly Lys Glu Ile Pro Val Ser Tyr Ser Gly Gly Ser Ser Phe SerPro 210 215 220 Thr Ile Gln Leu Thr Tyr His Asn Asn Ala Glu Asn Leu LeuGln Gln 225 230 235 Ala Ala Thr Ile Met Gln Val Leu Ile Thr Gln Lys ProHis Val Gln 240 245 250 Thr Ser Asn Gly Gly Lys Ala Trp Gly Leu Ser SerThr Pro Gly Asn 255 260 265 270 Val Met Asp Ile Phe Gly Pro Ser Phe AsnAla Ile Asn Glu Met Ile 275 280 285 Lys Asn Ala Gln Thr Ala Leu Ala LysThr Gln Gln Leu Asn Ala Asn 290 295 300 Glu Asn Ala Gln Ile Thr Gln ProAsn Asn Phe Asn Pro Tyr Thr Ser 305 310 315 Lys Asp Lys Gly Phe Ala GlnGlu Met Leu Asn Arg Ala Glu Ala Gln 320 325 330 Ala Glu Ile Leu Asn LeuAla Lys Gln Val Ala Asn Asn Phe His Ser 335 340 345 350 Ile Gln Gly ProIle Gln Gly Asp Leu Glu Glu Cys Lys Ala Gly Ser 355 360 365 Ala Gly ValIle Thr Asn Asn Thr Trp Gly Ser Gly Cys Ala Phe Val 370 375 380 Lys GluThr Leu Asn Ser Leu Glu Gln His Thr Ala Tyr Tyr Gly Asn 385 390 395 GlnVal Asn Gln Asp Arg Ala Leu Ala Gln Thr Ile Leu Asn Phe Lys 400 405 410Glu Ala Leu Asn Thr Leu Asn Lys Asp Ser Lys Ala Ile Asn Ser Gly 415 420425 430 Ile Ser Asn Leu Pro Asn Ala Lys Ser Leu Gln Asn Met Thr His Ala435 440 445 Thr Gln Asn Pro Asn Ser Pro Glu Gly Leu Leu Thr Tyr Ser LeuAsp 450 455 460 Ser Ser Lys Tyr Asn Gln Leu Gln Thr Ile Ala Gln Glu LeuGly Lys 465 470 475 Asn Pro Phe Arg Arg Phe Gly Val Ile Asp Phe Gln AsnAsn Asn Gly 480 485 490 Ala Met Asn Gly Ile Gly Val Gln Val Gly Tyr LysGln Phe Phe Gly 495 500 505 510 Lys Lys Arg Asn Trp Gly Leu Arg Tyr TyrGly Phe Phe Asp Tyr Asn 515 520 525 His Ala Tyr Ile Lys Ser Asn Phe PheAsn Ser Ala Ser Asp Val Trp 530 535 540 Thr Tyr Gly Val Gly Met Asp AlaLeu Tyr Asn Phe Ile Asn Asp Lys 545 550 555 Asn Thr Asn Phe Leu Gly LysAsn Asn Lys Leu Ser Val Gly Leu Phe 560 565 570 Gly Gly Phe Ala Leu AlaGly Thr Ser Trp Leu Asn Ser Gln Gln Val 575 580 585 590 Asn Leu Thr MetMet Asn Gly Ile Tyr Asn Ala Asn Val Ser Thr Ser 595 600 605 Asn Phe GlnPhe Leu Phe Asp Leu Gly Leu Arg Met Asn Leu Ala Arg 610 615 620 Pro LysLys Lys Asp Ser Asp His Ala Ala Gln His Gly Ile Glu Leu 625 630 635 GlyPhe Lys Ile Pro Thr Ile Asn Thr Asn Tyr Tyr Ser Phe Met Gly 640 645 650Ala Lys Leu Glu Tyr Arg Arg Met Tyr Ser Leu Phe Leu Asn Tyr Val 655 660665 670 Phe Ala Tyr 30 base pairs nucleic acid single linear Genomic DNA23 GCCNTCAAGG AGAAAACATG AAAAAAACCC 30 30 base pairs nucleic acid singlelinear Genomic DNA 24 GCCNGAAGAC GACGGCTTTT ACACAAGCGT 30 28 base pairsnucleic acid single linear Genomic DNA 25 GCCNAAAGCT TAGTAAGCGA ACACATAA28 33 base pairs nucleic acid single linear Genomic DNA 26 GCCNAAGGAGAAAAAACATG AAAAAACACA TCC 33 29 base pairs nucleic acid single linearGenomic DNA 27 GCCNGAAGAC GACGGCTTTT ACACAAGCG 29 30 base pairs nucleicacid single linear Genomic DNA 28 GCCNAACATT AGTAAGCGAA CACATAGTTC 30 33base pairs nucleic acid single linear Genomic DNA 29 GCCNAAGGAGAAAAAACATG AAAAAACACA TCC 33 30 base pairs nucleic acid single linearGenomic DNA 30 GCCNGAAGAC GACGGCTTTT ACACAAGCGT 30 27 base pairs nucleicacid single linear Genomic DNA 31 GCCNAAAAGC TTAGTAAGCG AACACAT 27 30base pairs nucleic acid single linear Genomic DNA 32 GCCNAAGGAGAAAACATGAA GAAAAAATTT 30 29 base pairs nucleic acid single linearGenomic DNA 33 GCCNGAAGAC AACGGCTTTT TTGTGAGTG 29 29 base pairs nucleicacid single linear Genomic DNA 34 GCCNAGCTTT TAGTAAGCAA ACACATAGT 29 29base pairs nucleic acid single linear Genomic DNA 35 GCCNAAGGATATTTATGAAA AAAACCCTT 29 29 base pairs nucleic acid single linear GenomicDNA 36 GCCNGAAGAC AACGGCTTTT TTATCAGCG 29 30 base pairs nucleic acidsingle linear Genomic DNA 37 GCCNGATATT AGTAAGCAAA CACATAATTC 30 31 basepairs nucleic acid single linear Genomic DNA 38 GCCNAAGGAG AAAACATGAAAAAATCCCTC T 31 30 base pairs nucleic acid single linear Genomic DNA 39GCCNGAAGAT GACGGATTTT ATACGAGTGT 30 30 base pairs nucleic acid singlelinear Genomic DNA 40 GCCNTTTTAG TAAGCAAACA CATAATTGAG 30 28 base pairsnucleic acid single linear Genomic DNA 41 GCCNAAGGAA CATCTTATGA AAAAAACG28 29 base pairs nucleic acid single linear Genomic DNA 42 GCCNGAAGACAACGGCGTTT TTTTAAGCG 29 29 base pairs nucleic acid single linear GenomicDNA 43 GCCNGGTTTT TAATAGGCAA ACACATAAT 29 30 base pairs nucleic acidsingle linear Genomic DNA 44 GCCNAAGGAA CATTTTATGA AAAAGACAAT 30 29 basepairs nucleic acid single linear Genomic DNA 45 GCCNGAAGAC AACGGCTTTTTTGTGAGCG 29 27 base pairs nucleic acid single linear Genomic DNA 46GCCNTCACTC AGTAAGCGAA CACATAA 27 29 base pairs nucleic acid singlelinear Genomic DNA 47 GCCNAAGGAA CATTTTATGA AAAAGACAA 29 29 base pairsnucleic acid single linear Genomic DNA 48 GCCNGAAGAC AACGGCTTTTTTGTGAGCG 29 30 base pairs nucleic acid single linear Genomic DNA 49GCCNTTTTAA TAAGCGAACA CATAAAAGAG 30 30 base pairs nucleic acid singlelinear Genomic DNA 50 GCCNAAGGAA CATTTTATGA AAAAAACGAT 30 29 base pairsnucleic acid single linear Genomic DNA 51 GCCNGAAAAT GACGGCGTTTTTATGAGCG 29 30 base pairs nucleic acid single linear Genomic DNA 52GCCNATATCA ATAGGCCAAA ACATAATTGA 30 30 base pairs nucleic acid singlelinear Genomic DNA 53 GCCNAAGGAG AAAACATGAA AAAATCCCTC 30 30 base pairsnucleic acid single linear Genomic DNA 54 GCCNGAAGAT GACGGATTTTATACGAGTGT 30 30 base pairs nucleic acid single linear Genomic DNA 55GCCNTTTTAG TAAGCAAACA CATAATTGAG 30 32 base pairs nucleic acid singlelinear 56 CGCGGATCCG AATCCAATTT AATCCAAAAA GG 32 33 base pairs nucleicacid single linear cDNA 57 CCGCTCGAGT TAAGTAAGCG AACACATATT CAA 33 20amino acids amino acid single linear peptide 58 Glu Asp Asp Gly Phe TyrThr Ser Val Gly Tyr Gln Ile Gly Glu Ala 1 5 10 15 Ala Gln Met Val 20 31base pairs nucleic acid single linear cDNA 59 CTGAATTCGA TTTCAAGGAGAAAACATGAA A 31 30 base pairs nucleic acid single linear cDNA 60CCGCTCGAGT TAGTAAGCGA ACACATAATT 30 32 base pairs nucleic acid singlelinear cDNA 61 CGCGGATCCG AATCCAATTT AATCCAAAAA GG 32 33 base pairsnucleic acid single linear cDNA 62 CCGCTCGAGT TAGTAAGCGA ACACATAGTT CAA33 29 base pairs nucleic acid single linear cDNA 63 CGCGGATCCGAAGTTTCTTT GTATCAAAG 29 33 base pairs nucleic acid single linear cDNA 64CCGCTCGAGT TAGTAAGCAA ACACATAATT GTG 33

What is claimed is:
 1. An isolated polynucleotide that encodes: (i) apolypeptide comprising an amino acid sequence that is homologous to theamino acid sequence of a Helicobacter membrane-associated polypeptide,wherein said amino acid sequence of said Helicobactermembrane-associated polypeptide is selected from the group consisting ofthe amino acid sequences as shown: in SEQ ID NO:2, beginning with anamino acid in any one of positions −19 to 5, preferably in position −19or position 1, and ending with an amino acid in position 689 (GHPO 386);in SEQ ID NO:4, beginning with an amino acid in any one of positions −20to 5, preferably in position −20 or position 1, and ending with an aminoacid in position 713 (GHPO 789); in SEQ ID NO:6, beginning with an aminoacid in any one of positions −20 to 5, preferably in position −20 orposition 1, and ending with an amino acid in position 725 (GHPO 1516);in SEQ ID NO:8, beginning with an amino acid in any one of positions −20to 5, preferably in position −20 or position 1, and ending with an aminoacid in position 691 (GHPO 1197); in SEQ ID NO:10, beginning with anamino acid in any one of positions −20 to 5, preferably in position −20or position 1, and ending with an amino acid in position 652 (GHPO1180); in SEQ ID NO:12, beginning with an amino acid in any one ofpositions −18 to 5, preferably in position −18 or position 1, and endingwith an amino acid in position 673 (GHPO 896); in SEQ ID NO:14,beginning with an amino acid in any one of positions −21 to 5,preferably in position −21 or position 1, and ending with an amino acidin position 619 (GHPO 711); in SEQ ID NO:16, beginning with an aminoacid in any one of positions −17 to 5, preferably in position −17 orposition 1, and ending with an amino acid in position 635 (GHPO 190); inSEQ ID NO:18, beginning with an amino acid in any one of positions −19to 5, preferably in position −19 or position 1, and ending with an aminoacid in position 626 (GHPO 185); in SEQ ID NO:20, beginning with anamino acid in any one of positions −16 to 5, preferably in position −16or position 1, and ending with an amino acid in position 467 (GHPO1417); and in SEQ ID NO:22, beginning with an amino acid in any one ofpositions −18 to 5, preferably in position −18 or position 1, and endingwith an amino acid in position 673 (GHPO 1414); or (ii) a derivative ofthe polypeptide.
 2. An isolated polynucleotide that encodes: (i) apolypeptide comprising an amino acid sequence that is homologous to anamino acid sequence selected from the group consisting of the amino acidsequences as shown: in SEQ ID NO:2, beginning with amino acid inposition −19 and ending with an amino acid in position 689 (GHPO 386);in SEQ ID NO:4, beginning with an amino acid in position −20 and endingwith an amino acid in position 713 (GHPO 789); in SEQ ID NO:6, beginningwith an amino acid in position −20 and ending with an amino acid inposition 725 (GHPO 1516); in SEQ ID NO:8, beginning with an amino acidin position −20 and ending with an amino acid in position 691 (GHPO1197); in SEQ ID NO:10, beginning with an amino acid in position −20 andending with an amino acid in position 652 (GHPO 1180); in SEQ ID NO:12,beginning with an amino acid in position −18 and ending with an aminoacid in position 673 (GHPO 896); in SEQ ID NO:14, beginning with anamino acid in position −21 and ending with an amino acid in position 619(GHPO 711); in SEQ ID NO:16, beginning with an amino acid in position−17 and ending with an amino acid in position 635 (GHPO 190); in SEQ IDNO:18, beginning with an amino acid in position −19 and ending with anamino acid in position 626 (GHPO 185); in SEQ ID NO:20, beginning withan amino acid in position −16 and ending with an amino acid in position467 (GHPO 1417); and in SEQ ID NO:22, beginning with an amino acid inposition −18 and ending with an amino acid in position 673 (GHPO 1414);or (ii) a derivative of the polypeptide.
 3. The isolated polynucleotideof claim 1, which encodes the mature form of: (i) a polypeptidecomprising an amino acid sequence that is homologous to an amino acidsequence selected from the group consisting of the amino acid sequencesas shown: in SEQ ID NO:2, beginning with an amino acid in any one ofpositions −19 to 5, preferably in position −19 or position 1, and endingwith an amino acid in position 689 (GHPO 386); in SEQ ID NO:4, beginningwith an amino acid in any one of positions −20 to 5, preferably inposition −20 or position 1, and ending with an amino acid in position713 (GHPO 789); in SEQ ID NO:6, beginning with an amino acid in any oneof positions −20 to 5, preferably in position −20 or position 1, andending with an amino acid in position 725 (GHPO 1516); in SEQ ID NO:8,beginning with an amino acid in any one of positions −20 to 5,preferably in position −20 or position 1, and ending with an amino acidin position 691 (GHPO 1197); in SEQ ID NO:10, beginning with an aminoacid in any one of positions −20 to 5, preferably in position −20 orposition 1, and ending with an amino acid in position 652 (GHPO 1180);in SEQ ID NO:12, beginning with an amino acid in any one of positions−18 to 5, preferably in position −18 or position 1, and ending with anamino acid in position 673 (GHPO 896); in SEQ ID NO:14, beginning withan amino acid in any one of positions −21 to 5, preferably in position−21 or position 1, and ending with an amino acid in position 619 (GHPO711); in SEQ ID NO:16, beginning with an amino acid in any one ofpositions −17 to 5, preferably in position −17 or position 1, and endingwith an amino acid in position 635 (GHPO 190); in SEQ ID NO:18,beginning with an amino acid in any one of positions −19 to 5,preferably in position −19 or position 1, and ending with an amino acidin position 626 (GHPO 185); in SEQ ID NO:20, beginning with an aminoacid in any one of positions −16 to 5, preferably in position −16 orposition 1, and ending with an amino acid in position 467 (GHPO 1417);and in SEQ ID NO:22, beginning with an amino acid in any one ofpositions −18 to 5, preferably in position −18 or position 1, and endingwith an amino acid in position 673 (GHPO 1414); or (ii) a derivative ofthe polypeptide.
 4. The isolated polynucleotide of claim 1, 2, or 3,wherein the polynucleotide is a DNA molecule.
 5. The isolatedpolynucleotide of claim 1, which is a DNA molecule that can be amplifiedand/or cloned by polymerase chain reaction from an Helicobacter genome,using either: a 5′ oligonucleotide primer having a sequence as shown inSEQ ID NO:23, and a 3′ oligonucleotide primer having a sequence as shownin SEQ ID NO:25 (unprocessed GHPO 386); a 5′ oligonucleotide primerhaving a sequence as shown in SEQ ID NO:26, and a 3′ oligonucleotideprimer having a sequence as shown in SEQ ID NO:28 (unprocessed GHPO789); a 5′ oligonucleotide primer having a sequence as shown in SEQ IDNO:29, and a 3′ oligonucleotide primer having a sequence as shown in SEQID NO:31 (unprocessed GHPO 1516); a 5′ oligonucleotide primer having asequence as shown in SEQ ID NO:32, and a 3′ oligonucleotide primerhaving a sequence as shown in SEQ ID NO:34 (unprocessed GHPO 1197); a 5′oligonucleotide primer having a sequence as shown in SEQ ID NO:35, and a3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:37(unprocessed GHPO 1180); a 5′ oligonucleotide primer having a sequenceas shown in SEQ ID NO:38, and a 3′ oligonucleotide primer having asequence as shown in SEQ ID NO:40 (unprocessed GHPO 896); a 5′oligonucleotide primer having a sequence as shown in SEQ ID NO:41, and a3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:43(unprocessed GHPO 711); a 5′ oligonucleotide primer having a sequence asshown in SEQ ID NO:44, and a 3′ oligonucleotide primer having a sequenceas shown in SEQ ID NO:46 (unprocessed GHPO 190); a 5′ oligonucleotideprimer having a sequence as shown in SEQ ID NO:47, and a 3′oligonucleotide primer having a sequence as shown in SEQ ID NO:49(unprocessed GHPO 185); a 5′ oligonucleotide primer having a sequence asshown in SEQ ID NO:50, and a 3′ oligonucleotide primer having a sequenceas shown in SEQ ID NO:52 (unprocessed GHPO 1417); a 5′ oligonucleotideprimer having a sequence as shown in SEQ ID NO:53, and a 3′oligonucleotide primer having a sequence as shown in SEQ ID NO:55(unprocessed GHPO 1414); a 5′ oligonucleotide primer having a sequenceas shown in SEQ ID NO:24, and a 3′ oligonucleotide primer having asequence as shown in SEQ ID NO:25 (mature GHPO 386); a 5′oligonucleotide primer having a sequence as shown in SEQ ID NO:27, and a3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:28(mature GHPO 789); a 5′ oligonucleotide primer having a sequence asshown in SEQ ID NO:30, and a 3′ oligonucleotide primer having a sequenceas shown in SEQ ID NO:31 (mature GHPO 1516); a 5′oligonucleotide primerhaving a sequence as shown in SEQ ID NO:33, and a 3′ oligonucleotideprimer having a sequence as shown in SEQ ID NO:34 (mature GHPO 1197); a5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:36,and a 3′ oligonucleotide primer having a sequence as shown in SEQ IDNO:37 (mature GHPO 1180); a 5′oligonucleotide primer having a sequenceas shown in SEQ ID NO:39, and a 3′ oligonucleotide primer having asequence as shown in SEQ ID NO:40 (mature GHPO 896); a 5′oligonucleotide primer having a sequence as shown in SEQ ID NO:42, and a3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:43(mature GHPO 711); a 5′ oligonucleotide primer having a sequence asshown in SEQ ID NO:45, and a 3′ oligonucleotide primer having a sequenceas shown in SEQ ID NO:46 (mature GHPO 190); a 5′ oligonucleotide primerhaving a sequence as shown in SEQ ID NO:48, and a 3′ oligonucleotideprimer having a sequence as shown in SEQ ID NO:49 (mature GHPO 185); a5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:51,and a 3′ oligonucleotide primer having a sequence as shown in SEQ IDNO:52 (mature GHPO 1417); or a 5′ oligonucleotide primer having asequence as shown in SEQ ID NO:54, and a 3′ oligonucleotide primerhaving a sequence as shown in SEQ ID NO:55 (mature GHPO 1414).
 6. Theisolated DNA molecule of claim 5, which can be amplified and/or clonedby the polymerase chain reaction from a Helicobacter pylori genome. 7.The isolated polynucleotide of claim 1, which is a DNA molecule thatencodes the mature form or a derivative of a polypeptide encoded by theDNA molecule of claim
 5. 8. The isolated polynucleotide of claim 1,which is a DNA molecule that encodes the mature form or a derivative ofa polypeptide encoded by the DNA molecule of claim
 6. 9. A compound, ina substantially purified form, that is the mature form or a derivativeof a polypeptide comprising an amino acid sequence that is homologous toan amino acid sequence of a polypeptide associated with the Helicobactermembrane, which is selected from the group consisting of the amino acidsequences as shown: in SEQ ID NO:2, beginning with amino acid inposition −19 and ending with an amino acid in position 689 (GHPO 386);in SEQ ID NO:4, beginning with an amino acid in position −20 and endingwith an amino acid in position 713 (GHPO 789); in SEQ ID NO:6, beginningwith an amino acid in position −20 and ending with an amino acid inposition 725 (GHPO 1516); in SEQ ID NO:8, beginning with an amino acidin position −20 and ending with an amino acid in position 691 (GHPO1197); in SEQ ID NO:10, beginning with an amino acid in position −20 andending with an amino acid in position 652 (GHPO 1180); in SEQ ID NO:12,beginning with an amino acid in position −18 and ending with an aminoacid in position 673 (GHPO 896); in SEQ ID NO:14, beginning with anamino acid in position −21 and ending with an amino acid in position 619(GHPO 711); in SEQ ID NO:16, beginning with an amino acid in position−17 and ending with an amino acid in position 635 (GHPO 190); in SEQ IDNO:18, beginning with an amino acid in position −19 and ending with anamino acid in position 626 (GHPO 185); in SEQ ID NO:20, beginning withan amino acid in position −16 and ending with an amino acid in position467 (GHPO 1417); and in SEQ ID NO:22, beginning with an amino acid inposition −18 and ending with an amino acid in position 673 (GHPO 1414);or (ii) a derivative of the polypeptide.
 10. The compound of claim 9,which is the mature form or a derivative of a polypeptide encoded by aDNA molecule of claim
 5. 11. The compound of claim 9, which is themature form or a derivative of a polypeptide encoded by a DNA moleculeof claim
 6. 12. A method of preventing or treating Helicobacterinfection in a mammal, said method comprising administering to saidmammal a prophylactically or therapeutically effective amount of acompound of claim 9, 10, or
 11. 13. The method of claim 12, furthercomprising administering an antibiotic, an antisecretory agent, abismuth salt, or a combination thereof.
 14. The method of claim 13,wherein said antibiotic is selected from the group consisting ofamoxicillin, clarithromycin, tetracycline, metronidizole, anderythromycin.
 15. The method of claim 13, wherein said bismuth salt isselected from the group consisting of bismuth subcitrate and bismuthsubsalicylate.
 16. The method of claim 13, wherein said antisecretoryagent is a proton pump inhibitor.
 17. The method of claim 16, whereinsaid proton pump inhibitor is selected from the group consisting ofomeprazole, lansoprazole, and pantoprazole.
 18. The method of claim 13,wherein said antisecretory agent is an H₂-receptor antagonist.
 19. Themethod of claim 18, wherein said H₂-receptor antagonist is selected fromthe group consisting of ranitidine, cimetidine, famotidine, nizatidine,and roxatidine.
 20. The method of claim 13, wherein said antisecretoryagent is a prostaglandin analog.
 21. The method of claim 20, whereinsaid prostaglandin analog is misoprostil or enprostil.
 22. The method ofclaim 12, which further comprises administering a prophylactically ortherapeutically effective amount of a second Helicobacter polypeptide ora derivative thereof.
 23. The method of claim 22, wherein the secondHelicobacter polypeptide is a Helicobacter urease, a subunit, or aderivative thereof.
 24. A composition comprising a compound of claim 9,10, or 11, together with a physiologically acceptable diluent orcarrier.
 25. The composition of claim 24, further comprising anadjuvant.
 26. The composition of claim 24, further comprising a secondHelicobacter polypeptide or a derivative thereof.
 27. The composition ofclaim 26, wherein said second Helicobacter polypeptide is a Helicobacterurease, or a subunit or a derivative thereof.
 28. A method of preventingor treating Helicobacter infection in a mammal, said method comprisingadministering to said mammal a prophylactically or therapeuticallyeffective amount of a polynucleotide of claim 1, 2, or
 3. 29. A methodof preventing or treating Helicobacter infection in a mammal, saidmethod comprising administering to said mammal a prophylactically ortherapeutically effective amount of a polynucleotide of claim 5, 6, or7.
 30. A method of preventing or treating Helicobacter infection in amammal, said method comprising administering to said mammal aprophylactically or therapeutically effective amount of a polynucleotideof claim
 8. 31. A composition comprising a viral vector, in the genomeof which is inserted a DNA molecule of claim 4, said DNA molecule beingplaced under conditions for expression in a mammalian cell and saidviral vector being admixed with a physiologically acceptable diluent orcarrier.
 32. The composition of claim 31, wherein said viral vector is apoxvirus.
 33. A composition that comprises a bacterial vector comprisinga DNA molecule of claim 4, said DNA molecule being placed underconditions for expression and said bacterial vector being admixed with aphysiologically acceptable diluent or carrier.
 34. The composition ofclaim 33, wherein said vector is selected from the group consisting ofShigella, Salmonella, Vibrio cholerae, Lactobacillus, Bacille bilié deCalmette-Guérin, and Streptococcus.
 35. A composition comprising apolynucleotide of claim 1, 2, or 3, together with a physiologicallyacceptable diluent or carrier.
 36. The composition of claim 35, whereinsaid polynucleotide is a DNA molecule that is inserted in a plasmid thatis unable to replicate and to substantially integrate in a mammaliangenome and is placed under conditions for expression in a mammaliancell.
 37. An expression cassette comprising a DNA molecule of claim 4,said DNA molecule being placed under conditions for expression in aprocaryotic or eucaryotic cell.
 38. A process for producing a compoundof claim 9, which comprises culturing a procaryotic or eucaryotic celltransformed or transfected with an expression cassette of claim 37, andrecovering said compound from the cell culture.
 39. A method ofpreventing or treating Helicobacter infection in a mammal, said methodcomprising administering to said mammal a prophylactically ortherapeutically effective amount of an antibody that binds to thecompound of claim 9, 10, or 11.